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1. Introductory Comments 


1.1 Level of the Discussion 


The names “artificial intelligence” and “‘self-organization” are often 
criticized on the grounds that they are prone to the contradictions of 
self-reference and other forms of paradox [/] [2]. But most people 
agree that ‘‘artificial intelligence” and ‘‘self-organization” admirably 
describe classes of phenomena and kinds of artifact that are nowadays 
very often encountered. Consequently, these names are used loosely 
to tag the phenonema or artifacts concerned. Any system that simulates 
mentation is deemed “artificially intelligent” and any system with a 
behavior that becomes more ordered (according to some vague criterion 
or another) is called a ‘‘self-organizing system.” 

Perhaps we cannot be more precise. On the face of it, however, a 
cybernetic demarcation and analysis of these systems are possible and 
potentially valuable. The loose usage is to be deprecated for avoiding 
paradoxes (of self-reference and contro]) which must be tackled to gain 
a proper appreciation of what goes on. But the classes of “artificially 
intelligent” and “self-organizing” systems are not properly represented 
within the theory of informationally closed systems and the required 
extensions of this theory are, in the first place, logical and ontological 
rather than mathematical [3]. A tentative cybernetic analysis is pre- 
sented as part of our discussion of this field. 


1.2 The Self-Organizing System 


To begin with, in Section 2, we outline the characteristics of a self- 
organizing system [4] and develop the special case of self-organization 
as it appears in connection with automata or computing mechanisms 
which may be either fabricated artifacts or living organisms. Hence our 
discussion is centered upon the property of “learning”’ and upon mech- 
anisms that give rise to a “earning’’ behavior. It is argued that non- 
trivial learning behavior is generated by populations of evolving 
automata (or, using a distinction due to Loefgren, by unlocalized 
automata) rather than single automata with well-defined inputs and 
outputs (localized automata). On the other hand the system we observe 
is usually a sequence of localized automata. The underlying evolutionary 
process is described by a sequence of relatively static images. 

It is necessary to take the word “learning” quite seriously. “Learning” 
involves more than adaptation. True, some kind of an engram must be 
laid, or some plastic change must occur, as a prerequisite for learning. 
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But, in view of the frantic activity that goes on as the concomitant 
of perceiving an event, it is difficult to imagine a brain that is not 
modified in some fashion. Minimally, we infer learning from evidence 
of a goal-directed change in a pattern of behavior. Since any consistent 
behavior pattern can be ascribed to a computation carried out by the 
object under scrutiny, learning is inferred from a goal-directed change 
of computation, and the object concerned “learns to compute”’ in the 
sense that its computational repertoire is enlarged as a result of ‘“‘learn- 
ing’. This point of view is consonant with our own intuitions in the 
matter and experimental results of the kind obtained by Bartlett [5]. 
We do not learn a poem or a story as a tape recorder might, by register- 
ing its image in some malleable substance. On the contrary, we learn 
the computations required in order to recite the poem or tell the story. 

Further, the learned computation is not so much retained as repro- 
duced. Memory is continual relearning. The distinction has nothing to 
do with dynamic versus static information storage. Either or both can 
be involved in the realization of an automaton. At the moment, 
however, we are viewing an automaton, as a collection of algorithms 
or, at a less detailed level, as a mapping from input to output that 
satisfies this algorithmic specification. Insofar as this abstraction is 
embodied in a brain or a network, the brain or network may be assigned 
a couple of extreme roles. At one extreme it is a telephone exchange, 
perhaps with variable connections, wherein definite parts have definite 
functions to perform. Altogether the functions that are performed 
describe the automaton, and an image of the brain is isomorphic with 
an image of the automaton. At the other extreme the brain acts as 
an internal environment or medium in which patterns of activity and 
constraint are able to develop; for example, patterns of interaction 
between impulse sequences and distributions of synaptic impedances. 
Insofar as memory involves relearning and reproduction, we are invited 
to adopt the latter view of a brain. If the developing organizations are 
identified with automata, these are reproduced in the medium of the 
brain. Their variations in response to internal or external change is a 
statement of their evolution (which, behaviorally, is manifest as 
learning). The form of variation is an evolutionary rule (which accounts 
for the goal-directed property of learning behavior). 


1.3 Specific and Distributed Processes 


Neither view of a brain or a network is necessarily more accurate 
than the other. Each is an image of the same physical object. Certainly 
there are regions in most brains and networks that are so profitably 
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described by the analogy of a telephone exchange that any other 
approach is out of court; for example, it would be absurd to describe 
a reflex arc or an input filter in any other way. On the other hand there 
are many regions that can be described in either fashion according to 
our convenience and the states of the physical object that we deem 
relevant to our enquiries. 

The important point is that learning is a property of such a descrip- 
tion and its behavior, not of the physical object ‘‘brain” or “network”. 
It is sensible to talk about learning if we have decided to view a brain 
or network as a medium in which a population of automata is evolving. 
Conversely, if our inquiries refer to learning, this image of the brain or 
network is the most convenient, and its states are most readily identified 
with states of the physical object. One way to assert that this decision 
has been made and that an object capable of acting as a medium for 
evolution is being observed is to say that we are considering a self- 
organizing system. 


1.4 Systems with Artificial Intelligence 


In Section 3 we dea] with artificial intelligence. The contention is 
that artificial intelligence is a special property of a self-organizing 
system. In particular, a system has artificial intelligence if it learns in 
much the same way that we learn and about much the same universe 
of discourse. This definition automatically excludes cleverly designed 
calculators and appears compatible with the spirit of present-day 
research in this field, though it implies rather more stringent criteria 
for intelligence than are commonly adopted and lays the emphasis 
upon dynamic characteristics (which are sampled in a test for intelli- 
gence) rather than capabilities that might be inferred from knowledge 
about the structure of a system. 

Like any self-organizing system, an artificial intelligence is a con- 
troller and (as we argue in the main discussion) it has an hierarchical 
structure wherein there are levels of control that aim to achieve different 
levels of goal. However, if we call the system intelligent, its control 
activities are necessarily termed ‘“‘problem solving’ and the stable 
states achieved as a result of these control activities are termed “problem 
solutions.” 

A little more is involved than the idiom of the field. We are at liberty 
to identify the states of a learning system with signs and to call any 
sign and its denotation a symbol. But if the system learns as we learn, 
then we are forced to regard it as operating upon symbols that denote 
the perturbations we choose to call problems. Further, we must 
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countenance symbolic operations, at a higher level in the control 
hierarchy, that act upon and transform symbols alone. 

So far, we have renamed an hierarchy of control, calling it an hierarchy 
of symbolic descriptions (which appears in the main discussion as an 
hierarchy of metalanguages). If it learns as we learn, it must perform 
modifications that we find familiar. (Unless we test for this capability 
we cannot discern an intelligent machine). 

Let us define a concept as the process whereby a symbol is assigned to 
a state of a description of the environment within its denotation (or to 
a set of symbols, in a description of a system’s internal state). The ac- 
quisition of a concept is a process whereby the concept (itself a process) 
is learned. Although any system that learns can be said to use, and 
possibly to acquire, ‘“concepts”’ in a broad and rather dubious sense, an 
artificial intelligence must use concepts like our own and it must 
acquire concepts in much the same way that we acquire them. 

Hence a study of artificial intelligence is chiefly concerned with the 
dynamics of a system of symbols whereas a study of self-organization 
also involves the underlying organization, states of which are identified 
with these symbols. Further, in considering artificial intelligence, 
microscopic semantic processes are important (whereas self-organization 
is a macroscopic property of physical assemblies). It is necessary to 
distinguish between perceptual and motor regions, for example, or 
between different kinds of problem-solving algorithms, and a great 
deal of the discussion involves a more detailed review of systems that 
have been previously considered as self-organizing systems. 


1.5 The Relevance of Brains 


Since artificial intelligence resembles our own mentation, the workings 
of a human brain have an obvious relevance to the design of an intel- 
ligent machine. This aspect of the matter is examined in Section 4 
which deals with various physiological and psychological models of 
human learning and concept acquisition. 


1.6 Heuristics and Symbiotic Interaction 


One outstanding feature in the design of intelligent machines is the 
role of ‘‘heuristics” or broad “patterns” of action (methods of proving 
hypotheses, for example, or criteria of similarity) that are part of the 
specification but which stem from human experience in problem solving. 
There is a very real issue of the extent to which an artificial intelligence 
can be independent of a human intelligence. At the moment, it cannot 
be. Coupling between the two may, as suggested above, depend upon 
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heuristics that are vehicles for injecting some wisdom into the artifact. 
Alternatively the wisdom can be gained through interaction with and 
experience of a man or a group of men. In Section 5 we consider this kind 
of man-machine interaction, both from the viewpoint of machine 
education and from the diametrically opposite viewpoint of extending 
the capabilities of a man and controlling his learning process. In fact, 
Section 5 is devoted to symbiotic interaction between men and machines 
(which can be contrasted with modes of interaction in which the machine 
is regarded as a tool), Conversation is a typical symbiotic relationship. 
The crucial test for symbiosis is the production of a joint concept 
(arising from interaction between the participants but which cannot 
be ascribed to either of the participants alone). Whereas concept 
acquisition within an intelligent machine entails the internal construc- 
tion of some element in a descriptive language, this process is exterior- 
ized in a symbiotic man-machine interaction and is evidenced by the 
construction of a common language. 


1.7 Descriptive Languages 


Since linguistic arguments prove essential in the analysis of artificial 
intelligence, the discussion of self-organization has also been phrased 
in terms of the languages used in describing and performing experiments 
upon a learning process. We could, of course, have avoided any mention 
of “language’”’ until Section 3 (because, as pointed out in Section 2, 
a “‘system”’ is isomorphic with an object language and its denotation). 
One advantage of adopting the more elaborate formulation (apart from 
consistency) is that properties like ‘‘learning”’ can be shown to depend 
upon the relation between a physical object and the observer’s descrip- 
tive language instead of depending upon a relation that entails the 
personal oddities of a particular observer. 


1.8 Localized and Unlocalized Automata 


Automata are completely abstract entities. But all interesting auto- 
mata are realized as physical structures and appear as organizations 
that are a property of some physical medium. The material dependence 
of these abstract entities is particularly important because of the 
dominant role assumed in our discussion by localized and unlocalized 
automata and the distinction of one class from the other. It will be 
wise to keep a tangible realization of each class in mind. 

Since we are considering the real world, a computing machine is not 
a typical localized automaton. It has an aura of permanence which 
belies the fact that any localized automaton, open to the structural 
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perturbations of the real world, has a finite life span. A better exemplar, 
perhaps, is an ape in a cage. The creature is a physically open system 
with a metabolism that preserves its structure. Informational closure 
can, however, be approximated (though, typically enough, internal 
information transfer is mediated by an autonomous activity that con- 
tinues while the animal survives). Since the ape has a consistent 
behavior pattern in a given environment, it computes a response as a 
function of its input stimuli (with its internal state as a parameter of 
this function). Finally, since we have some idea of the goals that apes 
aim for, and since we know the stimuli that count as signs, we can 
observe the goal-directed changes of behavior pattern that characterize 
learning. For the purpose of this analogy, the inputs and outputs of an 
ape are well-defined, so it is a localized automaton; but however well 
it is fed, the ape, like any other localized automaton, has a finite life 
span. It cannot survive indefinitely. 

The paradigm case of an unlocalized automaton is a cage containing 
a well-nourished and reproducing population of apes together with a 
signaling arrangement (a flashing lamp or a buzzer) which allows the 
experimenter to stimulate the population and some method of discerning 
the response of a typical member of this population (by recognizing 
a behavior pattern manifest by the majority of individuals). The 
individual apes are certainly automata, and the aggregate of apes is 
also a parallel computing machine. The input and output of the system 
are not, however, defined with reference to the individual that actually 
carries out the computation; hence, the parallel] computing machine 
realized by the population is representable as an unlocalized automaton. 
In common with a subset of unlocalized automata, which has been 
shown by Loefgren [6] to exist, the population of apes may have an 
indefinite life span. 

Our exemplar becomes more plausible and less trivial if we allow 
overt cooperative interaction between the apes. (There is an implicit 
competitive interaction, in any case, due to the food supply limitation 
and the finite boundaries of the cage.) We shall assume that the 
individuals interact (and may cooperate) through a system of signs, and 
normally these signs are precisely the signs that we use when stimulating 
the population and detecting its typical response. Indeed, we should 
aim to interact with the population in terms of the same language that 
is used for internal communication. 

To push the analogy one stage further, it would be possible to insist 
that the apes did cooperate by providing a form of environment in 
which an ape could only survive (receive sufficient nourishment) if it 
cooperated with other apes. (Hence, creatures that survive are forced 

115 


GORDON PASK 


to communicate in order to maintain cooperative interaction.) In 
artifacts that consist of a medium in which organizations evolve, this 
constraint is always applied. 


2. The Characterization and Behavior of a 
Self-Organizing System 


2.1 Various Definitions 


A “system” is not “‘self-organizing’” in a completely unqualified 
sense. Any suggestion that it is can be countered by several ingenious 
arguments to show that no such thing exists. In fact, the concept of 
“self-organization”’ is rightly applied to a relation that exists between 
an observer and the object he observes. “X” is “‘self-organizing”’ 
insofar as its activity leads a sane observer to assert this property of 
“X” and his relation to ““X.”’ By way of a definition, when we say that 
“X’’ is a self-organizing system we mean (i) that ‘““X” appears to become 
more organized and (ii) that as we observe it ‘‘X”’ forces us to revise 
our idea of ‘‘organization,”’ or to reselect the ‘frame of reference”’ 
(a system “‘structure’) in which this organization appears (and in 
which it is occasionally measured). The revision is necessary when 
observing ‘‘X”’, in order to keep track of ““X’’ behavior and to render 
a coherent account of it in our own “‘scientific’’ language. 

Wiener [7} [8], Beer [9], Mesarovic [10] [11], and von Foerster [4] 
have given strict and consonant definitions of a self-organizing system. 
The term is wedded to the field of control mechanism theory by 
axiomatic structures such as Pun’s [72] and it is used rather more 
broadly in connection with Bertelanffy’s [13] abstract system theory. 
For the present purpose we shall use von Foerster’s definition in which 
Shannon’s [14] measure, redundancy, is used as an index of organization 
and according to which a system is a self-organizing system if and only 
if the rate of change of its behavioral redundancy, R, is always positive. 
Formally, if H,,,, is the maximum informational entropy or variety 
(a function of the possible states, in the state description of the system) 
and if H is the informational entropy or the variety of its behavior 
(a function of the states which are occupied throughout an observation) 
then, from Shannon, 


R=1-H | 7: ae 
and von Foerster requires that 


dR/dt > 0 (1) 
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for any self-organizing system. It is readily shown that (1) is satisfied, 
providing the inequality 


A dH, /dt > Hya,dH [dt (2) 
holds true. 


Several cases are considered in the original paper. Adaptation cor- 
responds to the case when H,,,, is held constant and -dH/dt > 0 when, 
also, dR/dt > 0. There are also systems embodying some developmental 
process which increases the number of elements to be considered in a 
state description while maintaining H as a constant, and in this case 
dR/dt > 0 because dH,,,,/dt > 0. Finally these special cases of (2) 
may be combined to yield rather plausible images of growth accom- 
panied by differentiation, of the kind encountered in the development 
of populations and of embryos. 

To appreciate this formulation we must emphasize that a ‘‘system’’ 
is not, in itself, a matter of fact, such as a physical object. It 1s an 
abstract model, constructed in an observer’s descriptive language L* 
(often, though not always, a ‘scientific’? language) which has been 
identified with the physical object (by specifying procedures for observ- 
ation and measurement and other procedures for parametric adjust- 
ment). The bare bones of a system describe its possible states and their 
structure, hence a framework specified in Z* that limits the set of 
hypotheses that can be posed and the relevant measurements that can 
be made. The behavior of the system is a sequence of states. Observ- 
ations of this behavior, sometimes contingent upon a particular manip- 
ulation of the system parameters, provide evidence to validate or deny 
the hypotheses that have been posed. The measures Rf, H, and H,,,, 
are, of course, determined with reference to the basic structural frame- 
work and must be redefined whenever it is changed. 

It is not difficult to show that in some circumstances an observer is 
impelled to change his frame of reference in order to maintain the 
joint consistency and relevance of his observations. Thus the embryol- 
ogist has every right to regard the embryo as the relevant object of his 
investigations; but in order to make sense of it he is bound to perform 
experiments which are (or were until quite recently) formally incom- 
parable. (The first experiments entail state descriptions of cells, the 
next entail state descriptions of t7ssues, and so on.) Similarly a psychol- 
ogist has every right to address his enquiries to an individual baby; 
but, in order to do so, he must examine an apparently disconnected 
sequence of behaviors that refer to whatever bits of the environment 
happen to occupy the baby’s attention. 

There are many cases where the behavior in each member of a 
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sequence of systems reveals an increasing degree of organization, due 
to changes in H or H,,,,, or both. It is often also true that the sequence 
that is generated by successive redefinition has no limit apart from the 
arbitrary demarcation between disciplines; for example, when the 
embryo becomes a baby, embryological inquiries give place to psycho- 
logical inquiries. (We comment that, even if an observer insisted upon 
maintaining the original state description, his observations would 
become uninformative. Even if the baby is described in terms of the 
states of its cells the resulting description is not pertinent to psycho- 
logical inquiries. We need not argue the issue of reduction between 
different levels of hypotheses. At the present state of knowledge, cell 
states may be used to predict cell states, but cannot be used to predict 
the decisions of an organism.) 

Now the whole sequence can be justified in L* by a statement like, 
“T am looking at the physical object called an embryo which, for all its 
changes, I take to be a coherent entity,” or like, “I am looking at a 
baby.” The justification rests upon the fact that other observers, using 
L*, understand and agree with these statements which (because they 
have a higher logical class) are metastatements about the sequence of 
systems and have no direct connection with the observations that are 
made within the systems, although their cogency may be supported 
by the behavioral evidence. These metastatements associate the 
sequence of systems and allow us to regard them as a whole. In partic- 
ular, they legitimize an organization created as the disjunction of the 
several different systems which (providing that dR/dt > 0 for each of 
its component systems) is called a self-organizing system. Indeed, it 
can be argued that all nontrivial self-organizing systems have this form 
and are thus compatible with our original dictum. Any growth, for 
example, forces us to redefine the growing system unless it is uniform 
growth (as in the case of crystal growth which constitutes a trivial 
case of self-organization since the process could be accounted for, using 
a more competent state description, in terms of a simple rule). 


2.2 Special Case 


For the present discussion we shall deal exclusively with a special 
kind of self-organizing system encountered in the observation of learn- 
ing and the interaction between information processing structures. The 
self-organizing system is manifest as a sequence of adaptive systems, 
each of which describes a localized and adaptive automaton coupled to 
other automata or an experimental environment. (Localized automata 
are automata with well-defined and finite sets of input states and out- 
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put states). In practice the adaptive automata may be identified with 
an image of the computations carried on by a human being or an animal 
or a machine. Normally, the adaptive process is directed, in the sense 
that the human being adjusts his behavior to maintain or maximize 
some experimentally determined reinforcement, and the machine is 
designed to perform whatever computation maximizes the value of a 
variable that describes the state of its environment. Hence, the adap- 
tive automaton can always be viewed as a control mechanism. 

The restrictions entailed by considering a finite set of outcome states 
(input-output state pairs or equivalently stimulus-response state pairs), 
over any finite interval, are no more severe than the restrictions that 
are tacitly assumed in all behavioral experiments. We shall not examine 
the origin of these constraints in detail, but comment that they may 
be interpreted either (i) as due to the fact that a man, other organism, 
or machine is characterized by a finite “computational capacity” and 
a quantized outcome space or “‘field of attention”’ that remains invariant 
and is contemplated for at least a minimum finite interval, or (ii) as due 
to a constraint upon our own methods of observation, akin to Caianiello’s 
[15] ‘“‘adiabatic condition” that forces us to consider a behavior in 
terms of an invariant set of alternatives. 


2.3 A Model 


We shall approach the issues of nontrivial self-organization through 
a model or simulation of a self-organizing system that can be built from 
more familiar automata, which we shall, in any case, need to consider 
at other points in the discussion. In the first place we consider an 
adaptive probabilistic machine and show that, although it can act as a 
self-organizing system over a short interval, it is essentially instable. 
A collection of such automata, combined with an over-all selective 
mechanism, prolong the stable mode of this simulation; but in order 
to produce an indefinitely stable self-organizing system it is necessary 
to introduce an underlying evolutionary process. The simulation has 
been carried out on [16] a special purpose computer called the KucratTxEs 
system [17] as part of an investigation of learning and attention and, 
while trivial as a learning device, this model illustrates the difficulties 
embedded in the concept of a self-organizing system. 


2.4 Formal Representation 


(1) A finite informationally closed system 5, defined in Z* is specified 
by its state description, and certain initial constraints upon the possible 
changes of state. Suppose that ce are the most primitive states that 
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can be distinguished in L*. A finite state description is a mapping from 
a subset 2, of Y onto points C in a space of attributes C*. Consider a 
quantization of this space that determines a further set of discrete 
valued variables u* (which are the variables in an abstract model 
defined in L*) and states wu e[u,*, u.*,...u,*]. (We say that the system 
&, which embodies this abstract model is in state u if c eu.) Call the 
mapping 2° ~ U a description. An automaton defined in &, is a further 
mapping F of the form 


FB; [uy™ . © 6 Un *) > [U¥ ng Uy), ns M, 
where the product set [u,* ... u,,*] is called the input set and the pro- 
duct set [u*,,.,...w*,,] is called the output set. It is convenient to 
rename these sets 
xX =[u,",.+.%,"], x eX input states, 
Y =([U*¥ nape - + U* nd) ye Y output states, 
Up = (X,Y) CU, 


Thus these inputs and outputs define a projection from Up onto the 
X coordinate of Up and onto the Y coordinate of Up. The formulation 
is consonant with the work of Ashby [18], Loefgren [19], and Rosen 
[20]. 


(2) A fixed automaton computes a function 


y = fz), (3) 


or, if time ¢ is quantized and if we adopt the notation 
ut, at, and y|t 


for the states selected at times ¢ = 1,2..., then the above relation is 


interpreted as 
ylt +1 =f). 


The input of this fixed automaton may be manipulated by and its 
output may act upon some external entity, such as the observer or an 
instrument. On the other hand, if n> its environment may also be 
specified in 5). In this case the states of the environment will be uw 
e[U — Up], and the coupling between the automaton and its environ- 
ment will be defined by a pair of mappings 


A; Y ~[U — Upjand B;[U — Up] > X. 
Normally the behavior of the environment will be defined by a relation 


x = f* (9) where @ is a finite sequence of selections of y e Y when the 
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environment may be considered as a further automaton. In any case 
the automaton and its environment represent a pair of coupled sub- 
systems as in Fig. 1. We comment that, if n = M, either 5, is not 


The Controller 


The Environment 


(—} 
f 










I< 


The Environment 


Parametric Coupling 
Jp Varying Value of ¢ 


Fia. 1. A controller. 


completely closed, or the automaton is sessile, or it has a cyclic pattern 
of activity. 

(3) A variable automaton is capable of computing several functions 
f, according to the value of a parameter ¢ which indexes the selection of 
f from a set F of functions. Hence 

y = f(z). (4) 
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The usual formulation images ¢ as the output of an over-all controller 
computing the function 


do = 9(Z,Y;) (5) 


and if g has a directed property in the sense that the variable automaton 
obtained from (4) and (5) as 


y = Soczy,)(*) 


maximizes the value of a payoff function @, defined over the domain of 
the states of its environment, then it is called an adaptive automaton, 
which involves an hierarchy of control. Fig. 2 illustrates the structural 
consequence of an hierarchy of control. Its mathematical origin is the 
fact that g(x, y,) is a function of higher logical order than the f(z), 
which it selects (and this, of course, determines one level of organization 
in the hierarchy of control). If the adaptive automaton and its environ- 
ment are both defined in 5), the system is closed and the payoff function 
will depend upon states u or finite sequences of states 7%. If 4, is not 


If @ has m values, 
this is equivalent to 


Aer 


The ae ae 





Fia. 2, An adaptive controller. 
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closed, @ may be an arbitrary reinforcement. Further, if the adaptive 
automaton and its environment are both specified in 4, convergence 
of the ¢ values is guaranteed, ¢, > 4, >... 7 where T is either a 
value of ¢ or a cycle of values with the characteristic that it will 
maximize 0(%). 

Ashby points out that the least specific control criterion (it is hardly 
fair to call its measure a “‘payoff function’) is stability, and he demon- 
strated that any dynamic and informationally closed system will 
approach a stable state. (This may be a point equilibrium or a cyclic 
oscillation which is repeated, in which case, the terminal condition is a 
dynamic equilibrium.) If a subsystem can reach several stable states 
from a given starting state, the particular terminal condition depending 
upon the value of a parameter, then the system that includes this 
subsystem involves an hierarchy of control and Ashby calls it ultra- 
stable [27]. 

The corresponding paradigm for the case of an incompletely closed 
system where @ is a reinforcement variable entails the idea of ‘‘survival.” 
&, is defined, or its mechanical representation survives, if and only if 
certain physical conditions, indexed by some of the u*, have been 
satisfied. For a biological system the critical conditions are conveniently 
described as limits upon essential variables like body temperature. We 
stipulate that an organism survives if and only if the values of these 
variables remain between these limits, hence the corresponding system 
is defined if and only if certain of the u*, indexing essential variables, 
have values between u*,,,, and w*,,,,. The parameter changes in ultra- 
stability may occur as a consequence of an over-all controller sensing 
the fact that u*|¢ is in the neighborhood of u*,,,, OF U* min: 

To demonstrate this point, Ashby built a device called the ‘‘homeo- 
stat’ [27]. It consists of four interconnected positional servomechanisms 
and an over-all controller which is an arbitrarily determined and 
preprogrammed number selector. A particular plan of interconnection 
between the servomechanisms sets up a velocity coupling and (together 
with the transfer functions of each device) determines a function f, 
where the index ¢ is the number of the interconnection plan (and the 
number selected at this instant by the over-all controller). The positional 
outputs are interpreted as the “essential variables” u*. Limit indicators 
on the output potentiometers determine “‘critical values” w*,,,, and 
U* nin, and if u*¥ >U* na, OF U* nin > U* then a signal is delivered to the 
over-all controller which selects the next number, ¢, on its prepro- 
grammed list. 

Assume that the homeostat is stable, given a plan ¢,, and that no 
u* contravenes the limit condition for “essential variables”. Suppose 
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that one of the positional output potentiometers is arbitrarily disturbed 
to provide environmental input, the homeostat may return to its 
equilibrial state or it may become instable, in the sense that either 
u* > u* nsx OF that u*,;, > u*. In this case the over-all controller will 
select a number ¢, from the list. If stability is achieved, given that 
¢ = ¢, no further change occurs. If not, another value of ¢ is selected. 
Since the homeostat is designed so that some value of ¢ will induce 
stability, against any perturbation in the experimental repertoire, it 
always survives. 

Haire, Harouless [22], and Williams [23] have recently made a 
much larger homeostat and have extended the work done by Ashby 
on the original device. Chichinadze [24] has constructed a homeostatic 
model with a memory capability, and Tarjan (25] has extended 
Lypuanov stability to such cases. 

Similar comments apply when the computations performed by an 
automaton are ‘‘probabilistic.’”’ The automaton (and possibly also the 
observer who constructs 3, in terms of L*) has access only to state 
probabilities /7(x,) = IT, x;¢ X. The appearance of an input state (which, 
for this purpose, we call x,*) conveys imperfect evidence concerning 
the existence of x,;. Alternatively (or in addition) the state changes of 
the environment can only be defined ‘‘probabilistically”’ so that f* 
is replaced by a finite matrix JT = || JT (x; | 2;) || = || 7, ||. Uttley [26], 
[27], [28], [29] was the first to design a conditional probability machine 
able to estimate values of II(x; | x; ) and IZ(x;) as numbers p(x,* | x;*) 
and p(z;*). Such a machine infers the existence of x, ¢ X even if ;* is 
the input state providing that the value of p(z,* | x,;*) exceeds some 
arbitrary limit embedded in the design. Similarly conditional proba- 
bility machines, like Steinbuch’s [30] “learning matrix” and a device 
due to Katz and Thomas [31], which are related to or derived from 
Uttley’s work can make “probabilistic” selections from their output 
states. (The term ‘‘probabilistic” is very tenuous since an observer need 
not remain ignorant of an automaton he has specified.) 

In fact, the automaton is provided with (or, as later, it can generate) 
“probabilistic” values 1 p(y,|x;) > 0. (We omit the x* notation and 
assume that input states are accurately determined although /* is not.) 
An M component vector of these weights (||p,9|| given that x, is selected 
from X) is presented as a bias to a process that is independent of any 
other aspect of the system, but which may, of course, be specified as 
part of 5, in L*. The output of this process is illustrated in Fig. 3 as 
an index value & which selects a function f from a set 7, The set F 
contains M subsets corresponding to the M outputs states and ||p,9l| 
biases the so called “chance” process in such a way that pj» is the 
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Fic. 3. Probabilistic machine. 


“chance” that the output state y,;¢ Y is selected given x, eX or 
equivalently that f, is included in the corresponding subset. Conse- 
quently this automaton can be represented by the relations 


y = f,(2) 
& = Chance, (p(y/x), x) = Chance, (2, 9,) (6) 
Although it is often convenient to express this in the form 
py) = «(P), 
which implies 
py) t+ 1 =x ]t(P) =y Le (P)), (7) 


where p(y) is an M component output state probability vector and 
where P =p, and IT = || I7,|| are the state transition probability 
matrices that characterize the automaton and its environment. 


(5) In the adaptive form of probabilistic automaton the functions 
computed are doubly indexed 


Y= Jieg:(X), 
€ = Chance, (p(y|x), x) = Chance, (2, 9,), (8) 


¢= 9(%,9),  suchthat 60> Orax 
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However, it is often possible to interpret the rule g as a change in the 
values p;; in P and thus to obtain a relation corresponding to (7), 
namely, 


ply) = x(P5) (9) 


If, for example, 6 is a binary variable the potential across the con- 
denser in Fig. 4 can be shown to estimate the probability, given #,<«X, 


Equivalent to - 


xX Y 
X closed P(ys|xi) 
ifx;=1 Cal ~ y) Yj 
— ‘ 
o/ 
Y closed 
ifyy=1 p(yj=1n6= t[x,= 1) 


Fia. 4. Probabilistic circuit. 


that y,eY entails @ = 1. A machine (although the usage is variable, a 
machine is taken to be a realized physical device) of the kind in Fig. 5 
will derive m-M estimates of this form, which, from moment to moment, 
define m:M matrices P,. (The argument is more elaborate if @ is not 
binary, but providing that 1s 63 0 is not really different.) Andrew 
has reviewed the field [32], [33]. 

If an adaptive “probabilistic” automaton (characterized by P,) 
and its stationary “probabilistic” environment (characterized by IT) can 
be specified in &, convergent adaptation is guaranteed. Thus P, > Py. 
The output state probability vector p(y) = p(x) [JJ (P;)] will either 
be the fixed point vector of [7(P 7) or, if this process involves ergodic 
subsets only, of such a subprocess or, in the limiting case, it may define 
a trapping state. 


2.5 The Model 
Our simulation of a self-organizing system involves a sequence of 
systems &,, r = 1, 2,..., each of which is composed of an automaton 


characterized by P,, (describing a machine M,) and its environment 
characterized by JT, (describing a physical realization Z,). In addition we 
postulate an external selective mechanism A effecting a rule denoted as 
126 


Xm 


x2 


x1 


ARTIFICIAL INTELLIGENCE AND SELF-ORGANIZATION 


1 
2 
m 


~ > ~» 





Chance Process 





P(y¥1 0 6|x;) 





Stores for Interval At 


i} NJ 
h_Th 
Wy 








Select 1 Column of Analog 





Fic. 5. Details of probabilistic machine. 


127 


GORDON PASK 


A that selects the value of r. Formally, the mapping =, — A (&, 


) is a 
functor with the stipulated property that 


As shown in Fig. 6. The coupling between M, and Z, involves the input 
states X, and Y, and, as before, we denote the product set of outcomes 





M, is shown as selected 





M, — Mechanisms 
Z, ~ Environments 


Fia, 6. Over-all picture of selection among probabilistic subsystems. 


as U, = X,, Y,. Arguing from indifference we assign a matrix with 
equal values in each entry for the initially selected P,,. 

Now it is required to simulate a self-organizing system of the special 
kind discussed in Section 2.2. Consequently the condition 


dR (E,)/dt > 0 
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or, since the number of input states and output states is invariant, the 
equivalent condition 


-dH (&,){dt > 0 


must be satisfied. 

Indeed we can stipulate that 5, is defined or that its physical realiza- 
tion M,, Z,, survives if and only if it is always the case that dR (4,)/ 
dt > 0. 

Perhaps the simplest rule for ensuring that dR (4&,)/dt > 0 is 
embodied in the payoff function feedback that is illustrated, where 
(if 6, is the payoff function over the domain U, of &,) the system maxi- 
mizes 0, = R (&,). [In practice, 6, is proportional to an approximation 
of —H(é,)]. 

In general, as P,, > Pr,, the finite difference 40,, which is taken as 
an approximation to dR (&,)/dt, is positive. However, at or before the 
moment when P,;, = P7,, 46, = 0. Hence, although &, is a self- 
organizing system initially, it is an unstable self-organizing system. 

Could this be otherwise, if some other initial assignment of P,, were 
chosen or if a different rule were embodied? Since the system &, is a 
particular case of Estes’ [34] conditioning model, in which the trapping 
state is not uniquely determined because we only require any maximally 
regular behavior, the answer is known to be “no.” Certainly, more 
subtle rules exist for maintaining dR (£,)/dt > 0 over a longer interval; 
for example, it is possible to incorporate a limited “forgetting” capa- 
bility. But no such expedient can prevent the eventual demise of 
&, as a self-organizing system. 

Wattanabe [35], for example, has studied the convergence of statis- 
tical adaptive processes. If «, 8, and y are positive or zero constants 
Wattanabe suggests the form, for ¢ > fo, 


A(&,)|t = «a(t — )?.e-” 


which can be fitted to the output of statistical learning models (in 
specific cases to Bush and Mosteller’s [36] model and to Luce’s [37] 
model) or to empirical data (either from learning experiments involving 
organisms or from a machine like M,). In fact, when we start from a 
maximum variety condition with equal entries in P this equation can 
be fitted with 8 = 1 and t, = 0, hence 


H(&,) Lt = ate -” 
or 
at 


ae |) ~yt 
constant 


6]t~ RZ) lt =1- 
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We detect the condition 46, = R, > 0 and, when it occurs, remove 
M,, Z, from the simulation. What happens after that? The statement 
Aé@, = R, > 0 is signaled to the selective mechanism A (which, in our 
model, we conceived as a mechanism of attention) and the transform- 
ation A is applied to the system 5, to generate A (&,) = (&,,,) which 
is embodied in the machine M,,, and its environment Z,,,. We define 
A to satisfy 

(i) A (&,) € & as before and, also, 


(ii) dR[A(S,)/dt Lt, + dt > 0, 


where ¢, is the instant at which 48, = Ry > 0 and where 4t is an interval 
At > 1. Of these conditions (i) excludes the possibility of reselecting 
&, and (ii) is satisfied by any system other than 4, unless /7, is the stochas- 
tic inverse of the initially selected P,, (which we avoided by making each 
IT, embody some trapping state and assigning P,, with equal initial 
entries). Thus it is legitimate to specify X(2,) = &,,,. Hence application 
of the transformation X, starting with &,, on each occasion when 46, = 
R, > 0 gives rise to a sequence of systems 


EY = [5 >A (8,)...]; 


or to an over-all system 


where, as before, 


& =U (§). (10) 


Since each member of &* satisfies dR (£,)/dt > 0 it is true that 
dR (&*)/dt > 0 and that &* is a self-organizing system. 
The entire model is shown in Fig. 7. 


2.6 Physical Mechanisms 


There are many network-like adaptive computing machines that can 
be used to realize instable and, in this sense, trivial, self-organizing 
systems of the kind we have just considered in the abstract. Most of 
them can also be used as components in stable self-organizing systems 
and, before we embark upon the rather abstract issues connected with 
stability (and nontriviality), we shall briefly review these mechanisms. 

Any network consists of a finite collection of [38] elementary com- 
ponents (often called “artificial neurones’) that are coupled by connec- 
tions or “fibers.” Signals are unit impulses (or sequences of impulses). 
If the network is finite and structurally determined, no distinction 
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between changes occurring in a fiber and in a component is necessary, 
providing that the state of each component is specified by at least as 
many variables as the number of inputs it receives. [This is always 
true for the network but it need not be true for the automaton, as in 
2.6 (8).] Consequently, adaptation of a network can be specified in 
terms of the adaptive changes that are brought about in the transfer 
functions of its elementary components. It is often legitimate to imagine 
a network in which all connections are made (or at least in which there 
is a great deal of overconnection). Adaptive changes within the com- 
ponents appear, in this picture, as a differentiation of the network 
(whereby potentially available connectivity is blocked off). We shall 
consider the main types of artifact. 


(1) The elementary components are linear devices that summate an 
input quantity. A typical input quantity is the frequency or mean rate 
of impulses (analagous to the action potentials of real neurones) 
arriving at the input of the elementary component concerned and 
symbolized as 8. Each input connection (by analogy, each synapse) 
is either inhibitory or excitatory which we symbolize by a quantity 
w = + 1 (for excitation) or w = — 1 (for inhibition). Further, each 
connection is associated with a weight 15 a,‘ 0. Consider the jth 
component receiving inputs from several other components indexed 
i. Its transfer function is 

B; ® 2p eiimiBi 


or, allowing a delay of Mt and noting that w,; depends upon the invarient 
structure of the network, 


By Lt + At & Do wy wy LEB; Lt. 


Tn simulations such as Taylor’s [39], [40], the adaptation of a given 
synaptic connection on a component depends upon the previous inpulse 
frequencies it has experienced. Thus at ¢ = ¢,, and starting at ¢ = fy, 


t=tm 
Oi; Lb ~ Be Lt + ay Lto. 


Taylor has shown that suitable networks of this kind can adapt to 
discriminate patterns of excitaion that are applied to their input. In 
fact, a great deal is known about the pattern-recognizing capabilities 
of different networks. The matter has been approached from an analytic 
viewpoint by Novikoff [4/] and Lerner [42], from a synthetic viewpoint 
by von Foerster [43], Inselberg [44], and others [45] working at the 
University of Dlinois, and by Aizerman [46], in Moscow, who has recently 
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demonstrated a separation algorithm for a large set of patterns. In the 
network of Fig. 8, for example, von Foerster defines an action function 
as the distribution of excitation passed from the input receptor layer 
A (a retina perhaps) to the computing layer B (manifestly an action 
function depends upon the coefficients w,; and «,; if 1 indexes elements 
in layer A and j indexes elements in layer B). Imagine an indefinitely 
large number of infinitesimal elements. Let a denote displacement 
along the A layer and b denote displacement along the B layer. Defining 
the input excitation as p* and the ouptut excitation as f, it can be 
shown that for the illustrated network 

~ C48 (a) 


so that this network is a sensitive contour detector. Its action function 
L£ (a, b) is a member of the class of binominal action functions which 
have several important properties such as producing no output for a 
uniformly distributed input and yielding further binominal action 
functions if layers like A and B are iterated through a network. The 
immediately interesting point is whether or not &(a, b,) could arise by 
any reasonably adaptive process. In this respect, von Foerster’s group 
have considered a ‘‘maturational” adaptation. (In other words, they 


a———— 


Layer A 
Oy =+6 
ijn = -4 
Qijyg=tl 
Qijor = -4 
Qija = +1 





Layer B 
b————» 


Fie. 8. Network for binomial action function. 
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ask whether fibers could develop from elements in A to others in B 
according to some plausible plan and in such a fashion that Y (a, b) 
would characterize the resulting network.) Here, of course, ,; = 1 or 0 
and w remains to distinguish excitatory from inhibitory fibers. 

It can be shown that # (a, 6) will result from a random walk devel- 
opment process. Given a one-to-one mapping from A to B (such as the 
mapping 1->1, 2-+2,...m-—>m), consider chance perturbations 
of a developing A fiber from the assigned position in B. If these pertur- 
bations are generated by a random walk process which has variance 
Var + for excitatory fibers and Var — for inhibitory fibers and if Var — >> 
Var + the resulting network will, on average, transfer activity from A 
to B according to ¥ (a, b,). 

We comment that although the “maturation” or adaptation is 
statistically specified in the sense that a random process is involved, 
there is nothing haphazard about the specification. The random process 
is independent of directional bias and has the caliber of a forcing 
function that represents (in an abstract model) the physical develop- 
ment that occurs due to the energetics of a real brain or artifact. The 
crucial assumptions are a sufficient number of independent develop- 
mental steps and the variance inequality Var — >> Var +. 


(2) Many adaptive networks feature threshold components with 
characteristics 


Blt + At 


l 


Lif wo; tB LES yw Let 
Oif y; Lt > 2 Wy; Oj ‘Lt B; Lt. 


(11) 


Here, 8; is interpreted as a unit impulse of unit amplitude (or any 
arbitrary and constant value) and the term y, ¢ is called the threshold. 
In the simplest case the threshold value is constant so that y; ¢ + dt = 
vi Lt = Ye 

McCulloch and Pitt’s [47], [48] networks utilize elements of this kind 
but adaptation is not explicitly considered. Widrow [49], however, has 
constructed threshold artifacts using “‘adaline’’ devices with ‘‘memis- 
tors’ as adaptive coupling elements while Willis [50] has simulated a 
number of adaptive networks in which various rules of adaptation 
have been used to vary the values of the «,; and the w,;. Willis [5J], for 
example, considers an adaptive process that involves negative weights 
as well as positive weights. A particular adaptation rule that leads to 
successful performance in a single-layer recognition device is a function 
of input £; and output §;. 
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B; Lt B; Lt + At Aa; 
1 1 + 0.01 =4 11 
Aajt= 1 0 — 0.01 = 4 10 
0 1 0 =A0l1 
0 0 0 = 400 
where a& = a, wand Jaf = af [t+ At — of LE. (12) 


By far the largest block of data on adaptive threshold systems is due 
to F. Rosenblatt [52], [53] [who has conducted many experiments with 
‘“‘perceptrons’’]. In most cases the change in coupling coefficients (the 
A quantities cited above) depends upon some external instructor who 
adjusts the value of a reinforcement variable 6 according to his approval 
of the perceptron’s behavior. (The instructor may, of course, be a 
program.) In an “alpha” perceptron (assuming 8; = 1 or 0 only), 


A ok |t + At = 0B, Lt; (13) 
whereas, for a “gamma”’ perceptron, 
A ole +41 = oe (aye - 24 B10), (14) 


where the index i refers to those elements which may be coupled to the 
jth element and where M is the number of inputs to the jth element. 
Other modes of reinforcement are possible; for example, in various 
simulations 


Aaj;|t+ 4t = 6]t (8, t8,|t + 4]¢) — constant 


is the rule employed. None of these networks, least of all “‘perceptrons,” 
are limited to a couple of layers or even a laminar topology. So far as 
the perceptron is concerned, the minimal arrangement is shown in 
Fig. 9. Although many structures are discussed by Rosenblatt we shall 
consider (apart from the minimal case) only the most elaborate percep- 
tron structure that has been realized, which is shown in Fig. 10. 

Von Foerster considers a particularly interesting plan for adaptation 
[54]. Define the logical strength of a Boolean function as the number of 
zeros in its truth table representation. Let ¢ index the set of all Boolean 
functions of m variables with the restriction that ¢, > ¢, if the logical 
strength of a is greater than the logical strength of b. We now consider 
the network in Fig. 11. Let each element compute a function with logical 
strength of 0 (namely the least specific function, with all unit entries 
in its truth table, the tautology). The output from the summating device, 
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Fie. 11, Adaptive network. 


>); 8, actuates a ¢ increasing process by operating upon each element 
so that, for an arbitrarily chosen sequence of binary inputs, ¢,; |¢ + 4ts 
$; Lt whenever the different }); 8; Lt + 4t> 3°; B, Le. 

‘Lhe network adapts to become increasingly specific. We may regard 
this as a discrete maturational process or conceive the adaptation 
taking place inside a probabilistic device when these requirements are 
satisfied by one of Uttley’s models for generating a more specific 
structure within an overconnected conditional probability machine 


[55]. 


(4) Maron [56], [57] examines the behavior of fully connected net- 
works of m input elements characterized by 


Blet+4t=1 if [Tay lt lte> yLt 
aml iam 
im] 


as in Fig, 12. It can be shown that such elements have a rational induc- 
tive inferential behavior, according to the tenets of Bayes’ hypothesis. 
By taking logarithmic measures, the criterion can be reduced to an 
additive form. (The present representation is more convenient for our 
discussion). In view of a probabilistic interpretation that can be placed 
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Fra. 12. A fully connected network. 


upon this kind of network it is important to notice that two quantities, 
«,; and y;, are variable. (Maron’s treatment is closely related to Uttley’s 
two variable probabilistic computation hypothesis [27].) 

(5) Crane [58], [59] has developed a computing system in which the 
unitary components are active transmission lines. An electrical version 
is shown in Fig. 13 where discharge of condenser C, is assumed to 





Fig. 13. A neurister transmission line. 


dissipate energy along a path which leads to closure of the contact 
T,., which, in turn, leads to discharge of condenser C,,,. Hence a 
“wave’’ of potential decrement is transmitted from any point of stim- 
ulation and is accompanied by a contact closing ‘“wave.” In practice, 
the contacts may be realized by a thermistor material that is heated 
(impedance lowered) by the discharge current of an adjacent condenser. 
In this case the contact closing “‘wave’’ is a thermal disturbance. Since 
the condenser C, takes a finite interval to recharge to a critical potential 
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V, through R, there is a refractory interval (after a wave has been 
propagated along a line) within which no further wave can be propa- 
gated. Crane calls his active transmission lines “neuristors” since a 
nerve fiber is a particular realization of this mechanism. 

Considering the thermistor embodiment, we can create either thermal 
or electrical coupling between ‘‘neuristor”’ lines, as in Fig. 14 and Fig. 
15. Given the further facility of a undirectional propagation element it 





Fig, 14. Thermal and potential Fie. 15. Thermal coupling, 
coupling. potential uncoupled. 


is possible to compute the ‘‘or’’ function as well as the “not.” Hence, it 
is possible to compute any Boolean function with neuristors (and it has 
been shown possible, in fact, to compute any Boolean or probabilistic 
function, economically, with neuristors). 

An adaptive “neuristor’’ network has been constructed in my own 
laboratory [60, 61, 62] and is outlined in Fig. 16. The neuristor impe- 
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Dendrites “—"" 


Instable 
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Fig. 16. A chemical realization of a variable neurister transmission line. 
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dence # and its nonlinear “contact” are realized by different forms of 
the same physical process and, although this is not a necessary ex- 
pedient (we could, perfectly well, change the coupling & by one process 
and the T coupling by another), it leads to some interesting additional 
properties. 

The physical process is the development of metallic dendrites, 
controlled by electrolysis. R, simulation, by stable dendrites, is readily 
achieved, at a crude level. MacKay [63, 64] used a refined form of den- 
drite as a variable impedance and proposed its use as a delay component. 
(The DC potential between a pair of electrodes induces the development 
of a relatively conducting dendrite in a relatively nonconducting 
metallic salt solution, and the impedance between the electrodes is 
sensed by an AC current, However, if the solution is so constituted that 
a back reaction tends to dissolve the dendrite, so that the electro- 
deposition is countered, then it is also possible to produce unstable 
dendrites that act as 7’, components). 

The adaptive process can be reinforced either by varying the electro- 
lytic current or the concentration of the metallic ion from which the 
dendrite is constructed. Either R, or 7’, development can be fostered; 
but since there is no hard and fast distinction between a stable and an 
instable dendrite, adaptation may also give rise to ambiguous com- 
ponents (or ambiguous couplings). Of course it is possible to separate 
the R, dendrites from the 7, dendrites, for example, by growing these 
components in isolated chemical systems. But there is no need to do 
this in order to grow a network, and perfectly reasonable adaptations 
ean take place in which, although it is possible to define the perform- 
ance of the network, many of the physical components cannot be 
unambiguously assigned to the A, or 7’, form. 

Similar comments apply to adaptation in neuristor networks built 
from passivated fibers (Lilley’s [65] iron wire models) where a network 
simulation starts (in its overconnected form) as a steel wire scrubber 
hurled into cool 70 per cent nitric acid and differentiates as the often 
stimulated fibers wear away. A more refined approach has been adopted 
by Stewart [66], who has combined passivated neuristor elements with 
a dendritic mechanism for changing their coupling as a function of their 
activity. The same comments apply with greater cogency to neuristor 
networks realized as a mesh of polymer macromolecules, the poly- 
merization of the network being controlled by local catalysis modulated 
with impulses passing along the partially constructed transmission 
lines. (This, of course, is the informationally desirable scale proposed 
by Bowman [67], and physical chemists admit that the system is 
marginally feasible). 
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(6) Beurle [68, 69] and, more recently, Farley [70] and Clark [77] 
have simulated networks that are statistical models using digital com- 
puter programs. Babcock [72] has built a special purpose machine for 
work of this kind. 

In a typical simulation, the artificial neurones have well-defined 
properties somewhat similar to the threshold devices described by 
(11) but with the added property of a refractory interval in which, after 
excitation, the element cannot be excited again by any input and a 
relative refractory interval in which the element can only be excited 
with difficulty. (These refractory properties are realized by a variation 
in y, after 8; = 1.) 

The connectivity of a given simulation is defined according to a statis- 
tical rule and a random number table. (Hence, any one simulation is 
well specified.) The statistical rule usually stems from empirical data 
about the dendritic field of real neurones and may, for example, stipu- 
late that the probability of connection between unit 7 and unit 7 falls 
off exponentially with the physical distance between + and j. The 
simulation involves a very large number of unitary elements; and we 
are concerned with its macroscopic behavior which is found to be in- 
varient with respect to changes in the random number table which will, 
of course, generate an assembly of different, well-defined, networks 
which are supposed to characterize an infinite ensemble of networks 
with the stipulated statistical properties. 

The macroscopic behavior of these simulations is characterized by 
waves of excitation that are propagated in various ways through the 
network. Dynamic adaptations take place and, in addition, the inter- 
action between waves of excitation gives rise to more or less permanent 
structural modifications. The network may be self-oscillatory, and 
stable modes are possible in networks that involve sufficient inhibitory 
connections. The elaborate perceptron is a special laminar case of 
a potentially self-oscillatory network. 

(7) Pappert [73] has pointed out that a network capable of computing 
the 22” possible Boolean functions of m binary inputs must be adap- 
tively controlled by a parameter that assumes 2?” values. For large 
values of m this structure is gigantic and unrealizable. Consequently we 
cannot really build networks that adapt to recognize any pattern of 
stimulation imposed upon their input, if the dimension of the input is 
reasonably large. Some constraint must be introduced although, as 
Pappert also points out, the constraint need not be too severe. The 
problem of adapting to recognize whether a given input pattern belongs 
to A or B, where A and B are disjoint subsets of the set C of all input 
patterns, is tractable provided that the number of members of A U B 

141 


GORDON PASK 


is modest and that no response is defined for members of C— (A vu B). 
(Notice that no logical restriction has been imposed upon the composi- 
tion of A or of B.) 

Often, restrictions arise out of the choice of transfer function. It is 
well known, for example, that threshold components, characterized 
by (11) can only adapt to compute linearly separating functions. This 
limitation is considered by Scott Cameron [74] and Singleton [75]. 
Briefly, an input pattern to an m input threshold component is a 
binary m vector and the state of this component (if the threshold y is 
constant) is a point in a space with the m coordinates «,,. Adaptive 
variation of the «, locates a hyperplane defined by 


e Wiz Kg = Vie 
a 


If a pair of input pattern vectors can be separated by some hyper- 
plane determined by some assignment of «,; values, they are linearly 
separable and the threshold element can adapt to discriminate between 
them in the sense that 8; = 1 for one and 8; = 0 for the other. But 
adaptation within a single threshold element can achieve no other 
discrimination and it can be shown that the proportion of linearly 
separable functions of m input variables decreases very rapidly with an 
increasing value of m. The possible adaptations of a single layer of 
threshold elements is also, of course, crucially dependent upon the 
A rule chosen for (12) or (13) or (14). (This issue is considered, at length, 
by Willis [50] and Rosenblatt [52].) 

Less restrictive conditions upon adaptive modifications are probably 
desirable. Pursuing a suggestion due to Willis we might restrict our 
attention to automata capable of computing only disjunctively de- 
composable Boolean functions of the form. 


S(t. --%m)s 
where there exists some other function 
g(a... 2%), m>I>1 
such that 
S(t... %) = 9 (X41--- Xm) 


or, generalizing the idea, functions of 
f(y sn Bey) 


that can be expressed by / functions of subsets of o variables,m > o > 1. 
If this structure exists the automata can be reduced to subsystems such 
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as those in Fig. 17. (It is necessary to distinguish this form of decom- 
position of computing process from the partitioning of an hierarchy of 
control which, in a sense, is a decomposition of the organization imposed 
upon the computing process.) 


Output 





Fria. 17. Subsystems in a partitioned system. 


We comment that an hierarchy of control could, very readily, be 
introduced to select the partitioned subsystems in Fig. 17 (so that, for 
example, either B, or B, is processing the input data). Indeed, such an 
hierarchy will be needed in any case to convert the 4 adjustments made 
(as suggested above) in terms of event frequencies into adjustments 
made upon the basis of “desirable” event frequencies. 


(8) It is prudent to lay a rather different emphasis when discussing 
the constraints that act upon the adaptive or even “‘growing”’ structures 
of 2.6(5) and the statistically specified, self-oscillatory networks of 
2.6(6). (Incidentally, such a network can perfectly well be “grown”’; 
and if its components are rendered infinitesimally small, they reduce to 
neuristors.) Neither the transfer function of an individual component 
nor its detailed connectivity is particularly relevant to most of the 
enquiries we might make. The whole idea of an individual component 
becomes a fiction as we reduce the volume in which a signaling event 
is manifest as a local change of state and as [due to the ambiguities 
of 2.6(5) which, at a microscopic level, are no longer optional] we blur 
the distinction between a signal and the structure that transforms this 
signal. 

Macroscopic properties are, of course, important and these may be 
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either (i) properties of the material from which the artifact is built (if 
a neuristor network has been fabricated with a thermistor layer as the 
nonlinear constituent, this fact determines a maximum and a minimum 
transmission rate) or (ii) topological properties of a structure (given an 
anisotropic amplifying medium like a plane of neuristor, a torroidal 
connectivity will lead to self-oscillatory action). In general, the physics 
of a self-oscillatory network determines its stable and resonant modes. 
Moreover, a designer cannot get rid of these. They are constraints upon 
the assembly. Peter Greene [76], [77] has suggested how a designer can 
take advantage of their existence when realizing a self-organizing 
system. We reiterate the point of 1.8, that interesting automata are, 
in fact, physically realized. Their design is not a matter of logic alone 
but a compromise between logic and nature. 

Finally, recall the distinction of 1.2 between localized automata that 
can be identified with specific objects and unlocalized automata that 
reproduce and possibly evolve in a medium. Although no hard and fast 
distinction exists, the structures of 2.6(5) and 2.6(6) are more akin 
to media than computational objects. The automata that reproduce are 
organizations, which may be spatially localized or which may, like 
stable oscillatory modes, be spatially distributed. (These organizations 
are automata in the sense that the existence of stable activity maintains 
the condition in which certain computations take place.) The important 
point is that cybernetic concepts (like “an hierarchy of control’’) 
apply to an organization and only in very special cases to some localized 
structure. We shall return to this topic of realizing unlocalized automata 
in 2.8, 


2.7 Control Hierarchies and the Stable Self-Organizing System 


The model of a self-organizing system developed in 2.5 is trivial 
because it is inherently instable (in the sense that eventually it will 
not appear to become more organized or more adapted). The same com- 
ment applies to any realization of this model, however sophisticated 
the mechanism that is chosen from 2.6 to embody it. In order to examine 
the important distinction between stable and instable self-organization 
we must, as suggested in 1.2 and in 1.3, look at the linguistic constraints 
entailed in our relation to the physical artifact. 


(1) The control hierarchy of 2.4(3) is isomorphic with 4 of Fig. 18. 
The c; €C are called ‘“‘subcontrollers” which interact with the environ- 
ment by selecting the term y in the product pair u = x, y. The b;e B 
are higher level ‘“subcontrollers” which select from the c; ¢ C and A 
selects amongst the b, € B. 
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Fie. 18. The system #. In the case of 4 values of C parameter and 2 values 
of B parameter these illustrations are isomorphic. 
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Mesarovic [11] distinguishes between ‘“causal’’ and ‘“‘teleological’’ 
descriptions. (In a “causal” description we state the exact dynamics of 
an automaton over the domain of its inputs; in a “‘teleological”’ 
description we assert the basis of the automaton and the goal it is 
designed to achieve.) For the moment we assume that all the subsytems 
in.# are adaptive and that @ is (and must be) defined in L* in the 
latter fashion. For use later we adopt the convention that a statistically 
determined system is a ‘“‘causal” system; in other words, it would be 
irrelevant and unnecessary to open the “chance” device in order to 
achieve a ‘‘causal”’ specification. The c;¢ C compute a particularly class 
of function and aim to achieve the subgoal of maintaining a particular 
feature of their environment invariant. Similarly, the b; ¢ B compute 
another class of function and aim to achieve a higher level invariance, 
Further, A selects among the b; e B to achieve an over-all goal. Con- 
sequently, the hierarchy of control can also be interpreted as an hier- 
archy of goals. As one plausible identification of M the c, compute 
functions f as in (4) when their index i = ¢. The elements b; compute 
the functions denoted as g in (5) selecting successive values of i = ¢ 
and the over-all controller A selects values of 7 = r according to the 
rule A. 


(2) As MacKay [78] has pointed out, it is possible to regard the b; « B 
as selecting among symbols for the invariant features of the environ- 
ment maintained by the c; € C (or equivalently the goal of b, is achieved 
by selecting among the subgoals). Similarly, the goal of A is to be 
achieved in terms of a selection among symbols for the invariants 
maintained by the 6; ¢ B. Hence, in a certain sense, the behavior of 
the different levels in. is representable as a sequence of expressions 
in different levels of language. Let us make this idea more precise. 


(3) An observer (who may encounter rather than construct @ so 
that he is ignorant of its exact structure) can specify various experiments 
upon this artifact in terms of the scientific metalanguage L*. For each 
experiment he must communicate with some part of .@, and his 
interaction amounts to discourse in a language that Cherry [79] calls 
an object language L°. We shall write Z° = V°, 2°, &, because L° 
consists of a vocabulary or a finite alphabet of signs v ¢ V°, a set of 
restrictions 2° which determine the admissible methods of concat- 
enating the signs v e V° to form expressions v,, v,... in D°, and the 
denotation of v ¢ V°, written as &°. Some signs, say v C V,°, denote 
operations, admitted by 2°, for concatenating signs. If Z° is used to 
describe the behavior of some c;¢C; for example, these operations 
depend upon the functions computed by c; and the functions, specified 
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in the environment, by the experimenter. Other signs, v ¢ V,° say, 
denote equivalence classes of states, for example, if v € V,° and u;¢ U 
then &° may establish the mapping v, <> u,/H, where £ is a relation of 
equivalence in U. 

We are admittedly using the term “language” in a slightly eccentric 
fashion. In the first place (like most formal languages but. unlike natural 
languages) V° is determined and invariant. Again (like most natural 
languages but unlike formal languages) the denotation &° is specified 
in D°, Hence L® is an identified “language’’. However, although we 
shall have occasion to depart from this convention, the eccentric usage 
is convenient. 


(4) It is possible to distinguish between the languages L defined in 
L* and in addition, between the level of languages ZL defined in 
L*, We shall denote the level of a language as 7 = 0, 1,... and use this 
index to assert that if L°*! and L” are defined in Z* then L”*! is a 
metalanguage with reference to L” in the sense that further axioms are 
needed to derive L”*! from L". We adopt the convention that 7 = 0 
is the index of L°, an object language, and comment that, if L”, y > 0, 
is defined in L*, it should, strictly, be called a system metalanguage to 
distinguish it from L*. 

Since we are committed to a ‘‘teleological’’ description there will 
necessarily be an hierarchy of experiments concerned with .@ which 
entail communication between an observer and the physical artifact 
in terms of an hierarchy of system metalanguages. If D° is used to 
communicate with c; ¢C, then the v € V,° denote a set of equivalence 
classes of ue U,, while expressions in L° are the behaviors of some 
c, € C. Similarly, if L1 is used to communicate with 5; e B, then ve V,! 
denote c; goals or symbols for the invariants preserved in the environ- 
ment by the c; ce C. The L' expressions will be 6; strategies of control and 
the axioms needed to derive Z' from the object language L° will be 
the specification of goals in the ‘teleological’ description of .@. Finally, 
it will be possible to interact with A in terms of L?. 

The nontrivial distinction between languages at a given level depends 
upon a distinction between the type of their denotation. We shall use 
the index » = 1, 2,... for achieving this distinction and comment 
that if different values of this index are specified, L”,, L”,,, then the 
denotation of LZ”, and the denotation of Z?,, are distinct ontological 
classes. 


(5) Any system, in the sense of 2.4, is isomorphic with an identified 
language defined in L*, although the converse is untrue. To demonstrate 
the identity notice that a system £, is specified in L*, by definition, and 
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that the description & is a denotation of equivalence classes of states. 
Thus, for &, = U,, F,, &,, the alphabet V of a corresponding language 
L = V,2&, & consists of V, and V, of which V, is the set ue U, where for 
each u,; there is a correspondence u,; ++ >>;= 0,;/H determined by @, for 
o, €2,. Next the operations in 5; are members of F so that V, is the 
set vo> F and V, = &(F)U “‘o” where ‘“o” is composition. Since 
V=V,U V, we obtain V = &(U,Ué(F)u “o” and &, is a part of &. 
Finally since F c[U, U,...] the constraints 2 are operations that 
disallow some relations in[U, U,...] — F. 

(6) In his teleological description, Mesarovic distinguishes between 
levels and goals and between interactions that involve normal communi- 
cation (inputs and outputs of subsystems) and those that involve goals 
(statements of evaluation). Thus a simple adaptive control mechanism 
is a single-level single-goal system if it is viewed in a teleological fashion. 
On the other hand, it can obviously be reduced to or discussed in terms 
of a causal system if the parameter-adjusting strategy and a sensible 
part of the environment have been specified. If the corresponding 
automaton is finite and localized and if the environment is stationary, 
this causal specification is possible at a certain level of language (which 
will characterize a certain level of experiment). Let us call the level of 
language required to render a reduction from teleological to causal 
representation possible 7,,,,. For the single-level single-goal control 
mechanism, 7a, = 1. Hence, if a suitably denoted L' is defined in L* 
and if experiments are performed at this level of communication, they 
can refer to a causal system in which there is no distinction between 
goal interaction and normal communication. On the other hand if 
experiments are performed at the level of L° the system will necessarily 
appear “‘teleological.”’ 

The system “ is a many-level many-goal system (unless special 
restrictions are applied when it degenerates into a single-goal system). 
For .@, the value of »,,,, = 2,and consequently .# appears to be causal 
in experiments conducted at the level of LZ? but teleological in experi- 
ments performed at the level of L° or L1. At the most, .# may have four 
four goals at level 7 = 0 in the possible object languages L,°, L,°, L,°, L,° 
two goals at 7 = 1 corresponding to the pair of metalanguages L,! and 
L,! and an over-all goal, The system is representable in causal form in 
LT? hence ymax = 2. 

One of the most degenerate forms of / is shown in Fig. 29 (p. 179): 
although the hierarchy L°, L', L? is preserved, all distinction between 
language types is obliterated by the expedient of minimizing communi- 
cation (in the sense of normal input and output coupling) between the 
goal-directed subsystems. The structure in Fig. 29 is a typical sequential 
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processing organization (of a kind we shall often encounter in connec- 
tion with artificial intelligence) and contrasts with the parallel two, 
four hierarchy of the unrestricted .7#. 

Broadly speaking @ would neither be degenerate nor reducible to a 
“eausal’”’ form if (i) the subcontrollers c;¢C act upon environments 
that are incomparable in L* or (ii) their goals are incompletely com- 
parable in L*, or are partially incompatible or (iii) the over-all goal is 
no more than ostensively defined in L* or (iv) there is some evolutionary 
process that builds an hierarchy by adding on subcontrollers. 

With these comments in mind, let us return to our model of a self- 
organizing system. Let us examine the system in terms of an object 
language which denotes either states u C U or equivalence classes of 
these states. There are a couple of extreme possibilities, namely: (i) 
It is impossible to distinguish the &,. (ii) There is a distinct type label 
attached to any &, that has once been observed. 

Assumption (i) is plausible in view of the initial isomorphism of the 
subsystems M,. Adopting it, the experimental object language L° 
will have an alphabet V° (we write V° rather than V,.° for convenience, 
since it is not necessary to consider V,° )with signs denoting equivalence 
classes of states that contain one representative member of each disjoint 
subset U,. Thus, if v, ¢ V° this denotation implies 


r=p 
v1, U/E = U (uz). 
r=1 


Now it is true that the redundancy A(V°) will increase over a long 
interval (since the construction of L° implies that the observer is 
looking at a statistically lumped version of the p-fold process [(P,,,J7,), 
A,]. But since — H,,,, (V°) is constant and — H(V°) will fluctuate due 
to the discrete selections made by A, the observer will not see a self- 
organizing system and 4 R(V°)/dt will not be consistently positive. 

Indeed, if the obvious triviality of having only p subsystems is 
removed by providing a mechanism, call it “J,” that generates an 
unlimited supply of J, (so that A can continue to select subsystems 
indefinitely), it is no longer true that an observer experimenting in 
° will discern an adaptive system, however long he looks. The proposed 
modification is indicated in Fig. 19 but it could equally well be realized 
by a ‘“‘Pandora’s box”’ system such as Foulkes [80] describes. 

Now consider assumption (ii) which suggests experimental com- 
munication in terms of distinct object languages L,° with alphabets so 
denoted that V,° <> U,. The observer will now see a sequence of adap- 
tive systems each with a goal of maximizing the index 46,/di x AR 
(V,°)/4t which spring up in succession. By observing a sequence of these, 
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Evolutionary Process 
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Fre. 19. Model for evolutionary system. 


the observer could reduce the goal of “‘adaptation’’ to some causal form 
(like a rule 6 + 7), But he would be in difficulties about the action of 
A, in selecting the M,, which cannot be causally represented in any 
L°. Nor, if “J” replaces the finite set of subsystems, can the action of 
A be represented in any combination of the Z,° such as D®,,, with 
rep 
Vnax > U= VU (U,). 
r= 
Indeed, to make sense of the system and to justify collecting the 
necessarily distinct subsystems into a sequence called a self-organizing 
system, the observer must invoke some further axiom, and in this way 
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he is forced to construct Z! wherein V1 denotes the set of subsystems 
&,e8, At this stage the deliberate triviality of the model is revealed 
because the goal in L® and the goal in L' are isomorphic and the selective 
process A discerned by experimental interaction in L1 has precisely the 
same form as the rule ¢ > T (in the sense of Mesarovic the model is a 
single-goal many-level system which, in the absence of “J,” could be 
transformed into a many-goal single-level system). In passing, the 
triviality does exhibit one feature with an important analog in artificial 
intelligence if the basic process of problem solving is introduced in place 
of the basic process of adaptation [in 3.2(4) on p. 185]. 

As in the case of .@ the possibility of “causal” representation (and 
triviality with respect to self-organization) can be avoided by any of 
the expedients 6(i), 6(ii), 6(iii), 6(iv); being equivalent to adjoining “J.” 


(7) Any appearance of self-organization entails some ingorance 
on the part of an observer. The interesting issue is not the fact of this 
ignorance but the form it assumes. We shall consider a few typical 
cases, illustrating them (when pertinent) with our model. 

Type (i). A system is called “self-organizing” because an observer 
who knows V° and its relevant denotation &° (perhaps because the 
system is an artifact he has built) is ignorant of 2°. As he discovers 
§2° the system’s behavior seems to become more organized. 

This is the marginal case of black box” observation. Suppose the 
observer wishes 2° in order to encode information and communicate 
with the “black box’’. He inductively infers the functions computed 
by the black box from its inputs (which he may control) onto its outputs. 
Increased knowledge of 2° increases his experimental efficiency (par- 
ticularly if the “black box” has adaptive parameters). 

The crucial point is that in order to make sense of this enquiry 
V°, &, must be well defined. Typically and nontrivially, the “black 
box” might contain a learning machine with a well-specified adaptation 
rule such as Gabor’s “earning filter’? [87] or one of the systems cited 
in Andrew’s [32] discussion. 

Type (ii). An observer performs experiments in L° and wishes to 
maintain communication within a given universe of discourse. If, in 
order to achieve this result, he is forced to communicate also in L, he 
calls the system ‘‘self-organizing”’ [60]. 

A typical situation arises in experimental psychology. The object 
language of the experiment is L,°. The subject may also attend to 
irrelevant fields of attention by uncontrolled interaction in D,°. In 
order to maintain L,° communication, the psychologist (as in an inter- 
view) issues instructional statements which amount to expressions in 
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I, (MacKay [82] has discussed the semantic and informational status 
of instructions and similar metalinguistic assertions.) 

This simple situation can be simulated on the slightly refined model 
shown in Fig. 20. The chief refinements concern A, which becomes a 
probabilistic rather than a sequential rule (given a request to select 
some M, the mechanism A chooses one with a probability distribution 
Py, D2...) and the criterion for requesting selection (given that MU, 
is selected, a selection is requested from A if 46,*s T, > 0). 
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Fic, 20. Demonstration arrangements. 


The environments Z, are replaced by signal lamps actuated by the 
M, outputs and buttons actuated by an experimenter that deliver 
stimuli or inputs to M,. This communication takes place in L°, the 
distinctions L,°, L,° being arbitrary. 

In addition, there is a further set of buttons which convey L' in- 
structions “look at a,” “look at b,”’ where the a and 6 are values of r. 
Any instruction momentarily lowers the limit 7, (interpreted as a sort 
of expectancy) which later returns to its normal value. In addition, a 
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specific instruction signal such as ‘‘look at Z,” is averaged, individually 
for each value of r, to derive an “‘r instruction rate,” g,. The values of 
the p, are computed in proportion to the g, and inverse proportion to 
the 6, (hence in proportion to the possibility of r adaptation). 

The experimenter is told that the machine will aim to maximize its 
rate of adaptation, but the correspondence between stimuli and the 
instruction button inputs can be concealed by shuffling their connections. 
In these conditions the behaviors of the machine and of the experimenter 
prove amusingly lifelike. 

Type (iti). A system is called self-organizing because an observer, 
anxious to interact with it, either is or becomes ignorant of either the 
alphabet or the relevant denotation (of a given alphabet) that is re- 
quired in order to sustain this communication. This ignorance may 
apply aé a given level (the observer knows 7 but is uncertain of .) as in 
“changes of attention” or it may also involve the value of y (as in 
“reinterpretation of stimuli’’). In the simplest case the observer is 
placed in the position of a biologist who (unlike the engineer) recognizes 
that in nature L” must be discovered by broad scrutiny of the animal 
in its natural habitat. (The admission that there must be a close match 
between the symbols in L? and the stimuli used in an experiment is 
fairly recent and demarcates modern behaviorism from its naive 
precursor.) To cite a case, the visual system of an animal is simple as 
the frog accepts and responds to symbols in the space of four quite 
bizarre attributes of the environment, while excitations of the retina 
which appear like atomic stimuli to the experimenter fail to elicit any 
response. These attributes could hardly be discerned by any number of 
meticulous enquiries intended to reveal principles of perception that 
are “simple” according to the normal tenets of simplicity. However, 
they are readily discovered by a crass, intuitive examination of what 
a frog actually does. A closely related issue is considered in 3.2(7). 

Most creatures also change their attention or their attitude. Hence 
a biologist countenances the existence of several L,° and makes p > 1 
as a matter of course. 

Type (iv). A system is called self-organizing because its hierarchy 
of control (or, in Mesarovic’s sense, its structural hierarchy of goals) 
is modified and possibly extended as a result of discourse. A system of 
this kind will exhibit all the curiosities of type (iii) systems and type 
(ii) systems, but the behavior of these could be accounted for on the 
basis of invariant if obscure organizations (brought into play by specific 
agencies like mechanisms of attention). The type (iv) self-organizing 
system will be more or less indelibly modified by structural changes. 
These, of course, will alter not only the code it uses but the level at 
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which communication takes place and the level of abstraction at which 
data are processed. There is nothing mechanically absurd in the sug- 
gestion that such a system can learn concepts, and control procedures 
must have the logical caliber of conversations. Our idea of an invariant 
framework of languages breaks down when the artifact acquires the 
ability to build language systems. 7 and » must be regarded as variables 
and our communication systems as approximations to the existing state 
of affairs. 


(8) A type (iv) organization is derived from .# as in Fig. 21. 
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Fic. 21. Cooperative interaction between the subcontrollers in terms of control. 
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Andrea [83] has recently constructed an artifact that lies on the 
borderline between type (iii) and type (iv). As in a type (iii) system, 
Andrea’s device is hierarchically structured. At one level it learns about 
sensory motor connections, at the next about the solution to problems, 
and at the next about sets of solution patterns. On the other hand, it 
has more than one mode of activity. It may seek external reinforcement 
or it may perform an internal reorientation of solution patterns, 
seeking to achieve internal reinforcements. Like a type (iv) system, its 
objective is to learn [84] not to have learned. It must involve an evol- 
utionary process, in which subsystems compete, in some pertinent 
sense, for survival. But the development of an increasing level or 
organization depends upon cooperation between the subsystems. 
Mechanically speaking, the activities of the device will be distributed 
(which is a flexible and, in some cases, stable arrangement). Conse- 
quently, the invariant feature of this cooperative system (which may 
be embodied, for example, in the connectivity of a network) is an 
organization. The abstract automaton that images this physical struc- 
ture is an unlocalized automaton. 


2.8 Unlocalized Automata 


(1) As indicated in 1.8 we are concerned with unlocalized automata 
that reproduce and evolve. How can these automata be conveniently 
represented? 

A localized automata has a representation, as in 2.4(1), that is 
isomorphic with a state graph. An unlocalized automaton can also be 
represented in this fashion, but the formalism is cumbersome and apt 
to be misleading. 

In the first place, we must recognize the possibility that the auto- 
maton may not reproduce (that the physical machine it represents fails 
to survive). Hence there must be criteria whereby its state graph is or 
is not defined. With Ashby [85] and Rosen [86], we must admit that an 
automaton does not reproduce ttself. It 1s reproduced because of a 
dynamic interaction with its environment. (The fact that we are 
talking about unlocalized automata implies that we regard this inter- 
action as relevant.) So the state graph must be defined if and only if 
certain relations (an abstract reflection of material and energetic 
relations, to do with the “metabolism” of the machine) are ade- 
quately maintained. Rosen has formalized rather special cases of this 
situation. 

Next, the proliferation and evolution of the automaton need to be 
represented, and this entails a calculus for extending and modifying 
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the state graph. Rashevsky [87] has dealt with this problem for some 
biological systems but his transformation methods are difficult to 
instrument in the case of automata that compute. 

It is perhaps better to admit that an automaton is a property of the 
medium in which it is defined. (If we regard the flux of physical con- 
stituents that form an animal as part of the environment of the organ- 
ization “‘animal” then this medium is the environment. On the other 
hand, it is more usual, as in 1.2, to define a special internal environment, 
such as a brain, as the medium in which automata can evolve.) For 
economy and elegance we seek the least specialized medium that is 
possible. The proposal made by von Neumann [88] was an infinite plane 
of cells (a so-called “‘tesselation’’) in which any cell 7 could assume a 
finite number of states u e U;. Any state subset U; includes a special 
“null” state wu € U;. 

Von Neumann’s representation has been developed by Burke [89] 
and Loefgren [19]. An automaton is defined as a configuration of not- 
null states on a tesselation. (Hence, it is a property of this tesselation.) 
Entry into u) implies the obliteration of some aspect of an automaton 
and the transition from uw, into another state entails the creation of 
some aspect of an automaton. The states of cell 7, say, undergo transi- 
tion according to a rule that depends upon the immediate state of 1, 
u € U;, and the state of neighboring cells, say j7...1. Formally, this 
rule is a mapping 


&;[U;, Uj... U,] > (Ui). 


A localized automaton is a connected region of not-null states. An 
unlocalized automaton is commonly either a connected region that 
spreads out over the tesselation, as in Burke’s construction, or a wave 
of replicating individual automata that spread out over the tesselation 
as in Loefgren’s construction. An observer is at liberty to interpret any 
feature of this process as the automaton that is relevant. In particular 
we shall be concerned with evolutionary processes in which, to begin 
with, there are automata a € A, interacting in object language L°, that 
satisfy some pragmatic criterion [they are “historians” or ‘‘philo- 
sophers” as suggested in 2.7(6)] with respect to any environmental 
constraints we may impose in L°. As a result of evolution, there appear 
further automata b e« B (but B is a species of individual organizations 
b which consist of cooperative aggregates of a which interact in terms 
of L+). Now if the 6 € B also act like “historians” or ‘‘philosophers’’ we 
can perfectly well call them the automata of interest. Indeed, if the 
be B are better historians or philosophers and we aim to interact with 
the “best” species, we are forced into discourse with b € B, and, con- 
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sequently, into constructing an hierarchy of metalanguages L°, L'..., 
which places us in the position of the observer in 2.1. 


(2) The completely abstract tesselation model is mathematically 
elegant but it is difficult to choose interesting rules & and it is often 
difficult to interpret the configurations that appear. It would be con- 
venient to have a direct interpretation for values like proximity to a 
goal and the cost of maintaining the structure needed to perform what- 
ever computations are involved in goal achievement. On the other hand, 
it is obviously desirable to preserve the logical simplicity of the tessela- 
tion model. We need some representation between this abstraction and 
the self-building computer programs we shall encounter in connection 
with artificial intelligence. 

As a compromise, Masano Toda [90] has conceived a model in which 
automata akin to small animals move around in a very simple environ- 
ment characterized chiefly by a distribution of “food.” This “food” is 
a commodity that the automata must acquire and store because their 
structure must be paid for in terms of food, expended for this purpose. 
In addition to seeking food, the automata in Masano Toda’s model 
seek an independent goal and may cooperate with one another in 
pursuing it. Grey Walter [91], Barricelli [92], and Goldacre [93] have 
also conceived models of this kind. 

Independently, various similar models have been simulated (hand- 
simulation, assisted by the apparatus mentioned in connection with 
the model of 2.4 and also computor simulated) in my own laboratory 
(94, 95, 96]. One of these will be briefly described. 

The most primitive automata are creatures, a € A, able to move about 
in their environment, to eat the food available, and to emit signs (which 
they can also receive). These primitive automata are able to reproduce 
and create further automata but their survival depends upon the 
acquisition of sufficient food. 

The environment in which the automata evolve is a network of 
nodes, either over-all toroidally or over-all planar connected. Each node 
is associated with a food store that is filled at a rate that depends 
upon the availability of “food” (which is determined by the experi- 
menter) and upon the local conditions. This environment is ‘‘malleable’’ 
in the sense that the local rate of ‘‘food’’ inflow depends upon the food 
that has previously been eaten from the store. 

In terms of the actual simulation, the food store is a condenser of 
capacity C that is charged through resistances R, + R,. Of these R, 
is a fixed linear element, whereas R, is a nonlinear thermistor element. 
Hence R, depends not only upon the current that has passed (and 
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heated the element) but also upon a lag term, which determines the 
rate at which the heat is dissipated from the element. We make the 
dissipation slow with reference to the motion of the automata. 

As in Fig. 22, the automata attach themselves to the nodes and par- 
tially discharge C by eating. Hence the potential on C, denoted V, 
changes according to the number and avidity of the automata resting 
at the node concerned, as well as the influx of current. 










Node 2 | Automata 


Population 


Fic. 22. Simulated automata in environment. 


When an automaton rests at a node it eats food at a rate that is 
proportional to the difference between V and @, where @ is the amount 
of food that the automaton has accumulated in its internal store, unless 
8 > V, when the automaton is unable to eat. The food in the internal 
store is depleted to pay for the fabric of the automaton at a rate p which 
is a function of the age of the automaton. If @ falls below a certain 
value, 6, > 6, the automaton falls apart and is removed from the model. 
On the other hand if @ exceeds a certain value, 6 > 0,, a further 
158 


ARTIFICIAL INTELLIGENCE AND SELF-ORGANIZATION 


automaton is born as an offspring. At this point the parent automaton 
is “rejuvenated.” It and its offspring receive a starting amount of food 
in their internal stores of }0,,. In each case the aging function that 
determines the cost of maintenance, p, is assigned its age 0 value. 

In terms of the actual simulation, Fig. 23, the internal store is a 
condenser C' charged through a resistance R and a diode that prevents 
the automaton from eating if 6 > V. The “constant”? current valve 
determines the cost of maintenance. When a further automaton is 
produced contacts I and II are momentarily closed. Hence }8,, is 
transferred to the internal store of the offspring and the condenser in the 
age circuit is discharged. 











Offspring if 0 > 6 


Remove automata if 69 > 6 


To Internal Computation 


cr” Offspring 





— 
J 
a 


Fic. 23. “Age” circuit and ‘“‘stomach.” 


An automaton can move to any of its neighboring nodes or it can 
remain where it is. The decision to move is made upon the basis of 
several items of evidence which we shall consider in a moment. The 
adaptation in the automaton entails placing various ‘‘learned’’ inter- 


pretations upon this evidence. 
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The chief data, however, are information about the value of V pre- 
vailing at the various accessible nodes (for the automaton aims to 
survive, hence to maximize 6, which depends upon an adequate supply 
of food to eat), The automaton is thus born with a sensory apparatus 
that allows it to discern the value of V at the five accessible points 
indicated in Fig. 24 (which are also the points ‘‘O” to which an auto- 
maton resting at “+” can move at the next instant). 





Fia. 24. Possible moves. 


The decision to move into one or the other of these locations depends 
upon certain design principles: 


(i) An automaton must be active; for, in fact, we are interested 
in the motion of automata, rather than automata themselves. Con- 
sequently, the rate of eating at a node is made greater than the 
maximum rate of replenishment of the food. Hence, an automaton 
that remains in one position must eventually decay. 

(ii) The automata must gain as a result of correlated activity; in 
terms of game theory, the payoff function must determine an essential 
nonzero-sum game with a number of participants that depends upon 
the accumulated payoff. 


The basic requirements are facilities for providing an automaton, 
say, @,, with information about the action contemplated by any neigh- 
boring automaton, say a@,, with which its activity is correlated and 
facilities for adaptive modification of the coupling between a, and a). 
These requirements are satisfied by providing a communication system 
whereby the automata can indicate their state before a motion is com- 
pletely determined. We comment that this provision is strictly redun- 
dant. It is possible to replace the communication system by an inter- 
pretative facility exercised in respect to the sensory system. (For 
example, in other simulations, the automata have been designed to 
sense the gradient of food change and hence the potentiality for 
interpreting a steep change of food level as a sign for the presence of 
another automaton.) 
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The decision circuit that selects among the motions 1 or 2 or 3 or 4 
or 5 is a set of amplifiers with common cathode connection, the outputs 
of which actuate each of five trigger circuits. (These select one of five 
actions and are so constrained that one and only one can be energized 
at any instant.) The amplifier outputs in the a, decision circuit modu- 
late the amplitude of five oscillators of frequency F1, F2, F3, F4, F5 
to produce signs «, a, %3, «4, %;. The oscillator outputs are combined 
and applied to an image network of the “food” network, as in Fig. 25, 
at the image node of the node at which a, is located in the ‘‘food”’ 


Food Network Signal Network 





Fie. 25. Food and signal networks. 


network. The oscillator output signal, which conveys information about 
the tendency of a, to select each of the alternative motions, is attenu- 
ated in the image network and received by other neighboring automata, 
such as dp. In as, the signal is filtered into components F1, #2, F3, F4, 
F5, These are rectified and averaged and, through the @ maximizing 
adaptive circuit of Fig. 26, determine the sense as well as the degree 
of coupling between the decision process in a, and in a, or, of course, 
vice versa. 

Starting with at least one automaton, the rate of food inflow is 
increased, and the automata reproduce to form a population in dynamic 
equilibrium with the food that is available. 

Groups of automata form due to cooperative coupling, either by 


*The behavior of this simulation demonstrated several features of the evolutionary 
process and the form of simulation made it possible to examine why the behavior took 
place. However, it was obviously impracticable to use more than a few of the rather 
elaborate individuals. In more recent work we have simulated statistically respectable 
populations consisting of between 200 and 500 “live”’ individuals on a small computer 
(an I.C.T. 1220 machine) using a program that embodies most of the characteristics 
described but which allows for multiple data processing (the ‘‘ individuals”’ have some 
shared facilities). This work is also restricted by mechanical practicality, but is part 
way to a program that is being written for the ATLAas type computer. We shall only 
comment upon aspects of the evolutionary system behavior that appeared in the small 
program behavior as well as the initial simulation. 
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Fig. 26. Signal system. 


explicit communication, or through the food network. In this connec- 
tion it is important to recall that the signals and the image network are, 
in a sense, redundant. The automata interact with their environment. 
They may also interact through their environment with one another. 
Hence, there are organizations which are not automata as such, nor 
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even groups of coupled automata, but organizations partly embodied 
in the environment, i.e., some property of this medium. 

These organizations reproduce, using a mechanism of reproduction 
which has evolved, in place of the reproductive process built into the 
automata, which is relegated to a subsidiary place. The evolved process 
is a dynamic “template” mechanism. Any behavior of the automata 
necessarily induces a pattern P upon the environment because of its 
“malleable” character. Suppose this pattern favors the perpetuation 
of this or similar behaviors. The automata which jointly give rise to a 
behavior z, which are characterized by certain adaptive modifications, 
act upon their environment to produce P, which favors the survival 
of this sort of automaton (or more pertinently, since it is the motion of 
automata, not the automaton, that is important at this stage, the 
survival of this behavior). The process is autocatalytic and represented 
by a mapping 


Inducing” “Template 


oN i 
Z ss P or, equivalently Activity, Structure 


that is defined providing that this organisation (of which the mapping 
is a specification) can obtain sufficient food. In fact, stable organizations 
are characterized by many-to-many mappings from a set Z of z into a 
set F of P such that any zin Z induces some P in and any P in 7 
induces some z in Z. 

In passing, we comment that if Z is identified with a set of oscil- 
latory modes and if F is identified with a set of synaptic modifications 
induced in a malleable network when these modes exist, this z, P, 
model is isomorphic with a mechanism of learning in neurone networks 
proposed by J. W. S. Pringle [97]. This model is also a special case of 
Wiener’s [7] formulation of self-replication, the “noise” that acts as a 
forcing input to Wiener’s filter being the autonomous activity in the 
system. 

Hence evolution entails the development of different levels of organ- 
ization, or, by analogy, of a species B from the original species A. We 
regard automata a in species A as level 1 organizations and members 
of the species B as level 2 organizations. There is an interaction between 
level 1 organizations, as distinct from an interaction between level 2 
organizations, and these interactions are characterized by languages, 
say L° for level 1 and ZL’ for level 2. The signs in the L° language 
are discrete motions and their indices «. The signs 8, say, in the [! 
language are distributions of food or sequences of signals. Commonly 
an « sign has little effect upon a level 1 organism, and a f sign will have 
little effect upon a level 2 organism. But many an expression in L' will 
induce some L° expressions and many sequencies of L® signs will 

163 


GORDON PASK 


induce an expression in ZL}. Thus there is AB and BA interaction, and 
in some cases identification occurs between L° terms and L/ terms, 

At high density an interaction effect reminiscent of crystallization 
takes place. Some automata differentiate to indulge in distinct and 
invariant capacities in the level 2 organization. (They may, for example, 
perform only one motion as members of a chain of automata.) Broadly, 
differentiation is due to the fact that many automata are born and live 
their life in an environment that is almost entirely determined by their 
neighbors, 

Although we have provided only one A species of automaton, 
differentiation admits the coincident existence of several distinct B 
species of organization, say, B,, B,,... which may have languages 
Ly, Ln,... that are distinct, and it becomes necessary to distinguish 


between interactions like AY 3B (between levels) and others like 


Bx SB,(at a given level of language). 

To what extent is this a nontrivial self-organizing system (providing, 
of course, that it continues to evolve)? The medium or environment is 
always capable of fostering the mechanisms that evolve. The fact is 
that different properties of the medium become important at different 
levels in evolution. It would, in a certain sense, be possible to predict 
the possible modes of interaction, and this is also true if we adopt the 
obvious expedient of specifying an indefinitely extensive medium. 
But although this kind of comment is true, it is largely irrelevant. 

As observers, we are anxious to interact with the organizations that 
evolve, to make them compute for us (like the population of apes in 
1.8) and to make them adapt their computation so that certain goals are 
achieved. (In the present model these goals will be to bring about cer- 
tain food distributions.) The lowest level interaction in which we can 
indulge is to observe a distribution of a € A or of the a signs they emit 
in L° and to operate either (i) upon the local food distribution or (ii) 
upon the signal network with « signs. (One or the other is admissible 
depending upon details of the specification.) In this way we can induce 
specific adaptations (that achieve the goal). Thus we can make the 
automata behave, in a trivial sense, like the desired “philosophers” or 
“historians”. But unless we have chosen a trivial goal which can be 
optimally achieved by a € A (so that there are no better “philosophers” 
or “historians” than a ¢ A) we can do more than this by interacting 
(in terms of the B signs) with be B. But we are forced to build an 


hierarchy of metalanguages L°, L',... in order to maintain this 
interaction with organizations in species B,, B,,... and, further, to 
translate between the languages L,, L,, . . . of different species B,, B,, . . 
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We comment that it is not difficult to relate A, B,, B,,... to the 
forms (or, as they are often called, ‘‘plans’’) of program which are used in 
artificial ‘intelligence’ systems. On the other hand, there seems to be 
no reason why a self-organizing system of this kind should be reducible 
to triviality. To put the matter from a logician’s point of view, our 
communication with the evolving organization entails building an 
hierarchy of nontrivial metalanguages, and its unambiguous description 
involves an hierarchy of logical types of statement. 


3. Artificial Intelligence 


3.1 Basic Definitions 


(1) When we say that X is intelligent, X being a machine that some- 
body has built, we usually mean more than the trite assertion that 
X can deal with a suitably encoded intelligence test. The fact is, 
although we may accept test passing as sufficient evidence that a man 
is intelligent, we need more evidence when predisposed against X 
because it is a machine. MacKay’s [98] distinction between ‘intelligence’ 
and “‘intellect’’ is pertinent. Constructors and critics of artificially 
intelligent devices seem to be aiming for intellect (creativity in pursuit 
of rational as well as imaginative ends) rather than the logical dexterity 
that satisfies a narrow definition of “intelligence.” Given a man, we 
can take a modicum of intellect for granted. By convention, we cannot 
assume the existence of intellect in a machine and we shall take the 
requirement of intellect as an objective, that is, ideally, to be satisfied 
by an artificial intelligence. 

Tests for the logical component of intelligence present no difficulty, 
To satisfy them, the tested device must compute a suitably elaborate 
set of functions of its environmental input. Tests for the intellectual 
component are quite a different matter. 

The fabric of an artifact is irrelevant to its intellect (and to its 
computing capability as well). To be told that a man has a brain made 
of tinplate or blancmange does not shake my faith in his intellect. 
Similarly the mechanical specification of an artifact is irrelevant, for 
I could not recognize an intellectual circuit and doubt whether it is a 
meaningful entity. Whatever else, the test for intellect applies to the 
behavior of an artifact and not to the mechanism that mediates this 
behavior or the material from which it is built. Ashby [99] laid emphasis 
upon this point when he proposed a crucial test for intelligence in terms 
of the selective activity of a system. 

Among its other quirks, “intellect”? is the disjunction of many 

165 


GORDON PASK 


ostensively defined properties we feel bound to ask for in the repertoire 
of anything, man or machine, that is intellectual. Thus the system 
undergoing test should be able to use signs for things as its symbols 
(To solve problems in a universe of rational discourse rather than a 
factual environment). 

Another property we look for is adaptation. The exercise of intellect 
implies a certain lability, so that the function computed by an intel- 
ligent machine is adjusted to meet the demands of the moment. But 
this much lies within the repertoire of many computers and controllers 
that are never deemed intellectual; for example, the self-organizing 
systems of type (i) in 2.7(7). In order to pass the lability test, an artifact 
must change not only the function it computes but its system of sym- 
bols (its “concept” of its environment). Conversely, it must be undis- 
turbed if it is presented with different environments. (For each, it must 
construct a suitable representation of its own accord.) Hence it is, at 
least, a self-organizing system of type (ii) in 2.7(7), or of type (iii) in 
2.7(7). 

A test of lability and symbol construction is very much stronger than 
a test for the nontrivial employment of symbols for it involves “‘concept”’ 
building. The question is, in the first place, ‘In how many different 
environments must the machine be able to build concepts?” or ‘““How 
many different kinds of problem must it be able to solve?” and, secondly, 
“How different must these environments or problems be?’’. There is 
no completely unequivocal reply, but to avoid triviality, the set of 
environments that are used in the lability test must be formally in- 
comparable within the language L* that is used to describe the test 
situation. Hence, any machine that passes the test will have, by defini- 
tion, a behavior which is a self-organizing system of type (iv) in 2.7(7). 
It will be able to carve out an area of relevance within a wider environ- 
ment and it will be able to impose a conceptual pattern upon this area. 

On the other hand, not every type (iv) self-organizing system is 
intelligent. We might regard a Martian as intelligent or not (assuming 
he could pass the lability test), and our view about him would not depend 
entirely upon the tricks he could perform. We could perfectly well say 
he was a type (iv) self-organizing system (yet an unintelligent one) 
because we could not understand how or why he managed to control 
his surroundings. In the first place we might be unable to appreciate 
features of the environment that were obvious to a Martian. (All the 
same we could, quite conceivably, observe the regularities achieved by a 
control system that used these abstracted features as its input signs.) 
Secondly, we might fail to understand a Martian’s objectives or to 
discern what he deemed important. (A Martian may not eat food or 
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need his batteries charged or have any very consistent metabolic 
requirements.) 

Any decision we make in this matter is heavily weighted by our 
attitude. We agree that a control mechanism for an office block elevator 
is a computer but deny it intelligence although, when presented with 
a demonstration machine that computes exactly the same functions, 
we may waver in our pronouncement. Partly, this is due to the fact 
that we know the control mechanism has no option of its own. Mostly, 
however, our rejection of the elevator control as potentially unintel- 
ligent is due to familiarity alone. We are accustomed to this particular 
automaton and have assigned it an other than intellectual status. 

Given a self-organizing system of type (iv) in 2.7(7), its intelligence 
depends upon the form of metastatment that we have made in order 
to associate its otherwise disparate component systems, any one of which 
may represent a separable concept. Most people would agree to acknow- 
ledge the intellectual facet of intelligence if the relations between these 
concepts have the caliber of the relations between our own concepts 
of the same environment, and if the machine concepts are acquired in 
much the same way as our own. Crudely, an intelligent artifact learns 
in the same way that we learn. This comment can be extended to other 
aspects of mentation. Thus a “proof” offered by the artifact must have 
the status of the ‘proofs’ we offer. (Ideally, an artifact should not 
be constrained to a single type of proof; for example, to count as 
intelligent it should be capable of acting like some kind of historian, 
some kind of lawyer, some kind of biologist, and some kind of physicist.) 
The property of intelligence entails the relation between an observer 
and an artifact. It exists insofar as the observer believes that the artifact 
is, in certain essential respects, like another observer. The best check 
for the property is an attempt to converse with the artifact and to 
develop joint concepts as a result of this conversation which is sug- 
gested in 2.7(8). But manifestly this is not a “test” in the ordinary 
sense, for its result is informative only insofar as the observer (who 
administers the procedure) participates in the conversation, and to this 
extent his view is biased and equivocal. 

In a slightly different connection, MacKay points out that although 
such an observer is influenced by empirical data he must, at some stage, 
make an independent decision to “‘name’’ any machine that he deems 
intellectual. At this point it ceases to be an arbitrary creature and 
becomes one of his own clan. 

Thus an artificial intelligence is a self-organizing system of type 
(iv) in 2.7(7) which learns about a symbolic environment in order to 
solve the problems posed by this environment. Further, its ‘“concepts”’ 
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and its methods of problem solution are peculiarly human. The first 
speciality lies in a specification of this symbolic environment in which 
the system acts. To elucidate it, we shall describe a rather simple 
problem-solving computer program devised by Kochen [100]. The 
second speciality is the human-like component of the artificial intel- 
ligence which may be introduced: 


(i) as a set of “heuristics” (to use a term proposed by Polya [101}) 
or broad rules and suggestions for problem solution, 

(ii) by close-coupled interaction between a man and the machine 
(literally by a conversation in which the machine acquires man-like 
habits of thought), 

(iii) by embedding constraints into the program that stem from 
psychological models of concept learning, or 

(iv) by embedding similar constraints derived from physiological 
models of the process that underlies concept learning. 


Obviously, these restrictions are imposed at different levels of 
discourse. Suppose we choose L° to comprehend the physiological or 
mechanistic level of (iv), the constraints in (iii) are applied in ZL! 
(strictly in L™, ny, > 1) and those of either (ii) or (i) are applied in L? 
(or strictly in L?, yn, > y,). Although these constraints may often be 
applied jointly, it is convenient to make an arbitrary distinction 
between them. The heuristic constraints of (i) that lead to autonomous 
machines with little structural resemblance to a human brain are 
considered immediately. The constraints of (ii) are discussed in Section 
5 and those of (iii) and (iv), which give rise to rather abstract models 
of a human brain, are examined in Section 4. 


(2) Bruner, Goodnow, and Austin [102] performed a psychological 
experiment in which a sequence of cards, each displaying the presence 
or absence of attributes like color, shape, and number, were presented 
to a subject. With each card, the subject was informed whether or not 
the card belonged to an unknown subset of the possible universe of 
cards displaying these attributes, and he was required to assert his 
current belief about the composition of the unknown subset and ul- 
timately to define this subset with confidence. Depending upon the 
sequence of evidence, it is possible to deduce various characteristics 
of the unknown subset A. Bruner, Goodnow, and Austin were 
concerned with several cases but we shall concentrate initially upon 
the A of the kind they call conjunctive “concepts.” (Their usage of the 
word “concept” differs from the present usage. A conjunctive subset 
is a subset defined by the conjoint possession of several attributes.) 
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To formalize the environment, consider binary attributes denoted 
as x, Any exemplar (such as a single card) is defined by an n component 
binary vector X = 2,, x,...2x, and the entire universe of exemplars 
by the 2” vectors X. 

When an exemplar is presented, say at an instant ¢, the student also 
receives the information that it is or is not a member of A. Thus the 
input to a student at the ‘th instant is 


Viz={ay lt, Lambe, Ee £1¢} = X16, é 16, 
where 
€ét=1 if X |tear 
€ét=0 if not. 


Since any A is a conjunctive subset it can be represented by an n 
component vector of three-valued variables of which one value indicates 
an indifference. 

Call these variables 


I 


1 ifa,=1 
= Z_ ifz,is either 1 or 0. 


¥; 


I 


To illustrate the indifference value, we show the subsets A; ¢ Y; = 
[1, 0, Z] and A, = Y, =[1, 1, 0] for the case of n = 3 in Fig. 27. 
The output of a subject for an input sequence 


is formally a sequence Y|1, Y 12... to which may be adjoined a 
sequence of assertions about the value yw |t of his confidence in the 
guess that Yt = 2. 

We shall assume that the input sequence is arbitrary, probably 
redundant, but devoid of logical inconsistencies, with reference to the 
proposition that A is a conjunctive subset. In other words, sequence I 
of Fig. 28 is admissible, but sequence II is not. 

Kochen devised and tried out a number of different computer 
programs, each of which embodied one of several plausible strategies 
for guessing values of Y |¢ and asserting values of 1 |t. He compared 
these with one another according to various criteria and with the 
performance of human beings. In each case, the program (artificial 
intelligence “machines” M,, M,...) “guessed” Y]t =A, for various 
conjunctive subsets A, before sufficient evidence had been examined to 
prove, deductively, that Y|#-= A. On the whole, the M programs 
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Ag = Yo = 1,1,0 
(2) 


cme 


ya 
ay vli=o,t,0,1 vle=o1,1,0 vla=1,1,0,1 vl4=1,1,1,0 vls=0,0,0,0 





@) vli=0,1,01 v1]2=0,1,1,0 vl3=1,1,0,1 vl4¢=1,1,1,1 vls=0,0,0,0 
vle=0,0,1,0 vl7=1,0,0,t V18=1,0,1,1,........ 


Possible Sequences 


Fie. 27. Concept space. 


faired better than the human beings. Experiments were conducted for 
values of n = 3, 4, 5, 6, 7, 8, 12, 15, and various values of the numbers 
of z -valued entries in Y = A. (The symbolism adopted in the paper 
which contains these results differs from our present symbolism.) 

The simplest M is M, of Fig. 28. The initial “hypothesis”, Y 1, is 
that A = [z,z...z2] = Y 11. At any value of ¢ it may be the case that 


(i) X|teAN Y1t, when Y |t is confirmed. If so, M leaves Yt 
unchanged. 

(ii) X|teA4nm Y|t, where 7 is the complement of A and where 
Y |t is the complement of Y Lt, when Y |t is confirmed and M, leaves 
Y |t unchanged. 

(iii) X|t¢AA ¥ Lt, when the hypothesis is disconfirmed (because, 
by definition, the value of €]t in V|t =0 but YT é, the current 
hypothesis, asserts that this exemplar is a member of A). In this case, 
M changes its hypothesis. If j is the least value of ¢ for which, in 
Y 1t, the entry is z, then y,|t + 1 = y,|t for i Aj and the entry 
yt +1 =% Et. 
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(iv) X |feANY |t when the current hypothesis is disconfirmed 
(because the value of €]|¢ in V]# is 1 but the current hypothesis 
asserts that X |t is not a member of X). In this case y; |¢ + 1 = z for 
all values of i such that the values of y; |¢ are not logically or deduc- 
tively determined in the sense of Fig. 28. 


If C, |t is the number of occasions upon which the current hypothesis 
Y |t has been confirmed in the sense of XeYNA and if C,|¢ is the 
number of occasions upon which Y |[¢ has been confirmed by Xe¥ad 
and if p Lé is the estimated probability at the instant ¢ that £ = 1, and 
1 — p|t is the estimated probability that £ = 0, then, for M,, 


Pr LE =P Lt (ee Lt) 1— plc, Le. 


For each machine, M, a “distance,” between the hypothesis and 
A, is computed in order to evaluate the performance of M. In fact, 
Kochen used seven different machines. The form of » ]|t was modified 
Hy Lt — pe Lt to remove undesirable assymetries and the selection rules 
(iii) and (iv) were changed so that modification of the logically undeter- 
mined entries depended upon a “random” process (selecting which 
entry should be modified). We comment that . |¢ typically undergoes 
sudden “insightful” transitions. 

Whereas M, and some of the other machines retained the entire 
sequence of inputs in a “memory” and compared all of them with the 
current state, the derived machines had a restricted memory. (In 
most cases the ‘“‘random’’ process improved the performance, and 
restrictions upon the “memory” did not appreciably impair the per- 
formance.) A typical derived “machine” is shown in Fig. 28 as M,. 

As Kochen very carefully points out, none of these systems satisfy 
a criterion of artificial intelligence of the kind we adopted in (1). 
However, a combination of the experimenter with the machine may 
make up for the deficiencies in 17, namely: 


({) The machine does not learn. 
(II) Its environment is restricted. 
(III) The denotation of signs is determined externally (since the 
identification of the attributes is determined externally). 
(IV) The machine does not build an hierarchical structure. 


Of these, (1) is countered by the comment that the experimenter 
does the learning when he changes one machine into another, M, > M,. 
The modifications introduced (the difference between M; and M,) 
depend upon data that are computed by M;, namely, the values of 
p |t and of the distance function, and readily available parameters of 
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M, behavior. (II) is not a real objection. The environment is restricted 
by intention rather than necessity. Subsequent machines can, for 
example, deal with disjunctive subsets A =A, UA,U... which are 
predicted by evidence of the form in sequence IT of Fig. 28 and there 
is no reason why other, feasible, Mf should not make guesses about 
probabilistically defined subsets on the basis of ambiguous evidence 
of the kind that is delivered by sequence III. Since the possible 2?” 
subsets of the exemplars X consist of disjunctive subsets A or conjunc- 
tive subsets A this type of machine is unrestricted in its domain and 
it can be shown capable of dealing with the indefinite and ambiguous 
sequences of evidence that pose the inductively solvable problem of 
characterizing a probabilistic subset. 

On the other hand (ITI) implies a more serious deficiency. A machine 
of this type is limited to situations in which both the attributes (of 
which values constitute relevant evidential data) and the objective or 
goal (the subset A) are well defined before problem solving begins. 
Obviously there are some situations demanding intelligent problem 
solving wherein this limitation is acceptable. But we can never judge 
whether or not the criterion of (1) applies unless the machine is capable 
of dealing with situations where the alternatives are not well defined. 
In this case, we know that the machine does not possess this capacity 
and the goal and the relevant data have both been selected by the 
experimenter. Unfortunately, there is no readily asserted algorithm 
that the experimenter adopts when he deals with the issues of (III). 
In fact, he can, at the best, rationalize his decisions by announcing 
some heuristics. Similar comments apply to (IV) if ‘‘heuristic” is replaced 
by “‘evolutionary”’ rule. 

(3) Artificial intelligence systems live in a symbolic environment 
comparable to the universe of binary vectors, but frequently of a much 
more elaborate kind. Sometimes the symbolic environment is restricted 
to geometrical propositions or logical expressions (as in Newell, Shaw, 
and Simon’s ‘“‘Logic Theorist” [103]) or figures on a retina. Sometimes, 
as in Newell’s [104] General ‘“‘Problem-Solving”’ program, the environ- 
ment can be made up from any abstract objects and almost any 
relations between them. The system is provided with a set of operators 
with which it can act upon the objects in its environment and a set of 
differences, or distinguishing attributes that it can detect and which are 
used to discriminate between objects. 

A problem is posed by specifying some initial object and the goal of 
reaching some other object; for example, the initial object may be a 
logical expression and the goal object its proof or in the General 
Problem-Solving program [which we shall call G.P.S. {I)] any other 
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object related to it by a sequence of transformations in the symbolic 
environment that corresponds to a sequence of processes in the artificial 
intelligence. 


(4) The majority of systems can be criticized on the grounds that 
they do not embody the gamut of processes that. make them indepen- 
dent of the experimenter or the programmer. But, as suggested in 
(2) above, this criticism is trivial if the experimenter’s or the program- 
mer’s activity could be programmed. Hence it is very profitable to look 
at systems that are fragments of an artificial intelligence and which 
deal with special facets of problem solving providing that among them 
there are systems capable of assembling these fragments into a compo- 
site entity. 

Minsky [105] believes that five types of process are usefully distin- 
guished: 


(i) Search for a goal, involving a sequence of choices based upon 
the evidence derived from measures like: (I) the value of achieving 
a goal, (II) the proximity to a goal, (III) the amount of computation 
that is expected (or the length of algorithm needed) to achieve this 
goal, (IV) an index of which method (or type of algorithm) is best, and 
(V) an index of the cost of the computation involved. 

(ii) A process that reduces the ultimate solution of goal achieve- 
ment into partial solutions or subgoals. We comment that if the 
measures (I), (II), (IIT), (IV), (V) can be defined, then a suitable 
process exists. 

(iii) A heuristic procedure defining relations of similarity and of 
equivalence. It may be necessary, for example, to view a given and 
unsolvable problem as being equivalent to a problem that can be 
solved (when the same method is applied). It may be necessary to 
adopt a novel method which is similar to (but, on some grounds, is 
supposed to be more successful than) a previously adopted method. 
Indeed, Minsky and Selfridge [/06], Travis [107], Marzocco [108], 
and others regard the basic heuristic of artificial intelligence as 
“Given a problem, apply a method of solution which is a generalized 
version of a method that was previously successful when applied to 
a similar problem.” 

(iv) Recognition of a pattern or, in a generalized form, the con- 
struction of a denotation or a connotation for expressions in the 
currently adopted language. Hence, the pattern concerned may be a 
set of relevant attributes [709] or a relevant configuration of attribute 
values, or a goal, or some method of problem solution [710]. 

(v) Learning whereby organizations evolve, differentiate, or adapt. 
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We shall maintain these distinctions without, however, considering 
the items in a particular order. 


(5) These processes, which arise from the heuristic constraints of 
(1), are defined with reference to a very special system of symbols (and, 
as in 1.4, we are interested in the dynamics of this system). Hence, the 
existence of a given process entails organizations of the kind we dis- 
cussed in 2.7 and 2.8 which, in turn, entail physical mechanisms of the 
kind we considered in 2.6. But the correspondence between organiza- 
tions at these different levels of discourse is usually many to many. 

Thus an hierarchical ordering that gives one “‘subgoal”’ priority over 
another and which might be said to induce a kind of ‘‘preference” in 
an artificial intelligence bears no obvious relationship to the hierarchies 
that seem evident in a functional description of the physical entity 
in which the artificial intelligence is realized. (Consequently, as in (1), 
we cannot recognize an intelligent circuit.) Nor is there any reason why 
there should be such a relationship (terms like “priority” and “‘prefer- 
ence” become contentions because we have a sneaking feeling that there 
should). 

On the other hand, there is a basic analogy between the structure of 
a type (iv) self organizing-system and the structure of an artificial 
intelligence. In the most propitious case this amounts to an isomorphism. 

In an artificial intelligence program the unit of organization looks 
like: 

(i) Test a current hypothesis against a given set of data. 
(ii) Perform an operation that is selected according to the outcome 
of the test. 
(iii) Observe the result of this operation as reflected in the available 
data. 
(iv) Either return to (ii) or proceed to the next unit of organization. 


The unit is conveniently depicted as 


| 


es ela 





where “(C)”’ stands for hypothesis testing and “[>” stands for an 

operation that is performed. 
Since most artificial intelligence programs are written in a list pro- 
cessing computer language, it is relatively easy to make certain of the 
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operations into the creation of novel tests or novel operations or the 
deletion of unwanted tests or operations. 

Now this unit of organization is isomorphic with the “tote” unit 
(test operate test exiat unit) which Miller et al.[111] use as the building 
block for mentation; and, as they point out, the existence of “tote” 
units is symptomatic of a “‘plan,” isomorphic with a program (just as 
the present “units” of organization are symptomatic of the program 
in which they are defined). The unit is also isomorphic with the real- 
ization of a controlled branching algorithm in the sense of Ma.kov 
[112], or, as Hunt [173] argues, a recursive computation of the form 


S(2) = a2), g(x) =a 
= f(B(z)) g(z) = 6. 


Finally, it is isomorphic with the adaptive subcontroller, in a type 
(iv) system. 

Similar comments apply to the assembly of basic units of organiza- 
tion into a system, only in this case it must be recognized that the list 
processing computer language (and the necessary predisposition to 
linear representation) imposes very severe limitations. 

The point is illustrated in Fig. 29 where item (1) somewhat extends 
the basic model of an hierarchy of adaptive control shown in Fig, 2 
and item (2) gives the isomorphic representation of this hierarchy in 
the selective representation of Fig. 18. In item (3) we apply the 
restriction that one and only one subcontroller is selected at once (thus 
introducing a sequential process). Item (4) demonstrates the iso- 
morphism that exists (given this sequential, one at once, restriction) 
between a “tote” unit and a subcontroller and item (5) is isomorphic 
with the hierarchical structure of item (3). 

There is nothing sacrosanct about sequential data processing. 
Oliver Selfridge’s “Pandemonium” [/14], a parallel system we shall 
consider later, is a notable exception to this rule. As Newell [715] points 
out, a ‘‘Pandemonium” which is analogous to item (2) can take a 
broad view of all that is going on in a system, whereas a sequential 
program is written on the assumption that different parts of a compu- 
tation are separable and interact by closely prescribed channels. There 
are many circumstances under which a broad view is handy to have. 
But is it necessary? 

The trite reply is that any finite dimensional image and presumably 
any parallel system can be represented in terms of a linear sequence 
providing punctuation terms, indicating disposition, are adjoined to 
the alphabet of signs. Markov [112], for example, gives a construction 
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for this purpose. Hence, there is some curiously encoded translation 
of a parallel machine which does the same tricks as the original. But 
is this the point? 

In particular it is a matter of doubt whether the linear shadow of a 
parallel organization can be realized in a physical fashion or whether it 
can evolve in a mediwm under rules of evolution that we are able to 
appreciate. 

3.2 Specific Processes 


(1) The simplest kind of goal achievement is evidenced by a controller. 
Of course, the goal (indicated by the maximum of a suitable payoff 
function) need not be easy to reach. Minsky and Selfridge [706] have 
considered various cases (simple optimization with ‘“‘hill climbing” in 
the parameter space to a unique maximum, multiple maxima, the case 
of stochastic rather than deterministic “‘hill climbing,” and the intrac- 
table case of an isolated “pinpoint in a plan’’). 

When there is only a single type of goal the search conducted by an 
artificial intelligence in its symbolic environment is analogous to the 
more difficult cases of control maximization; for example, search for 
one among the set of possible Boolean expression is analogous to the 
“pinpoint” in a ‘‘plain” case. 

On the other hand, this kind of goal seeking is unusual in the real 
world. Intelligent creatures aim for many and diverse objectives, using 
vastly different methods. Some of this richness is preserved in G.P.S. 
(1) by allowing several types of goal [116]. 

To describe these, let us assume that the symbolic environment of 
G.P.S. (1) is logic. The objects available within this environment consist 
of logical expressions like “A VY B” and “A C(B\VC).” The operations 
given to the G.P.S. (I) are transformations like “AV B= B\V A” 
and like “BCA-=> BAA.” The differences between objects are of 
the form “changed position” and “changed connective.” The given 
operations are only relevant to certain of the differences, for example, 
the operation “A VB= B\VA” is only relevant to a difference in 
the position of the variables concerned and, further, any logical ex- 
pression that is converted into the form “A V B” can be transformed 
into another logical expression ““B \/ A’’ such that the only difference 
lies in the position of the variables. The relevance of different operations 
is conveniently described by a binary application matrix with a ‘1’ 
indicating relevance and a “‘0”’ a lack of it: 


Operations F 


Differences 1 0...1 


G 0 1...0 
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Now the main types of goal in G.P.S. (I) are: type T, transform an 
object a into another object 5; type R, reduce a difference @ existing 
between a pair of objects a, b; type A, apply an operator F, to an 
object. 

When G.P.S. (I) is posed a problem, it evaluates the problem, as in 
Fig. 30, and if the problem is accepted it decides upon a method of 
solution which is associated with one of the types of goal. Each method 
applied to the initial object in pursuit of the selected type of goal will 
produce subgoals which are often of different type. The recursive 
character of this goal-directed computation is apparent from an in- 
spection of Fig. 30. Thus a type T goal involves a method that tests a 
difference between a and b. If no difference exists, this goal is achieved. 
If a difference does exist the test leads to the type R subgoal of reducing 
the difference. This type R subgoal entails testing for the relevance 
of an operation F to this difference and induces the type A subgoal of 
applying F to reduce this difference. Similarly, type A goals lead either 
to type A subgoals or to type T subgoals. The entire computation 
terminates either on G.P.S. (I) discovering the solution object or dis- 
cerning that it cannot solve the problem posed to it. 


(2) The recursive character of the goal-directed organization (which 
led to a very convenient “list processing’’) reduced the potentialities 
of G.P.S. (I) by rendering it too inflexible. [Newell [117] points out 
that the price paid for the flexibility of many goal types, an advance 
over the rather earlier ‘logical theorist’ program, was the restricted 
organization of the search process in G.P.S. (I).] It is a characteristic 
of this essentially sequential organization that tests and subgoals 
cannot readily be reactivated and that control resides at any instant 
in a rather isolated subroutine. Newell [727] describes various methods 
that were tried to overcome this difficulty, for example, to impose an 
over-all control upon the subgoal routine which evaluated the success 
of each subgoal and, as a result, returned to the previous stage in the 
search or allowed the search to continue. A highly centralized system 
like this has the defect that the overall controller (the central authority) 
receives inadequate information on which to base its decisions. 

The program was ultimately reformulated [as a program G.P\S. 
(II)]. As indicated in Fig. 30 the search “tree” structure of G.P.S. (I) 
is replaced by a kind of mobile executive which assembles data about 
goal achievement, and refers back to an over-all specification of the 
method to be adopted. This structure constitutes a compromise, on the 
way to an ideally much more parallel organization. 


(3) As it stands, the objects and the differences and the operations 
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are introduced by a programmer. But a practically useful artificial 
intelligence should be able to learn the objects, operations, and differ- 
ences that it needs in order to solve problems. 

A couple of learning situations have been considered and partially 
simulated in G.P.S. The first is “learning the entries of the application 
matrix,” given that the operations F and the differences G are defined. 
It is the paradigm case for association learning. 

The naive approach to this problem is a collection of statistical 
registers that estimate how successfully the application of arbitrarily 
selected operations reduce each difference. The matrix is constructed 
by placing “1’’ whenever the success of an operation with reference to a 
difference is above some limiting value and a ‘‘0” if it is not. But the 
matrix can, in fact, be learned more efficiently by using the algorithm: 


“Apply the test for a difference to each operation and if the result 
is positive enter “1” in the application matrix. If the result is negative, 
enter “0” in the application matrix.” 


Essentially, the learning process has been removed from the domain 
of statistical aggregation and placed in the domain of problem solving. 
Hence it can be tackled by a problem solving machine. 

Next, we consider the much more interesting matter of learning a 
novel set of differences so that the problem solving machine, equipped 
with this learning algorithm, can partially build (or specify the domain 
of) its application matrix. This question is a special case of “similarity 
learning” and is obviously a basic issue in all discussions of intelligent 
and adaptable perception. It yields the paradigm case for concept 
learning. 

For this purpose we label the identified processing language in terms 
of which problems are posed as L° == V®, 2°, 6°. Formally L° consists 
of a finite vocabulary V°, consisting of the names for objects and opera- 
tions and differences, rules 2° which restrict the strings of signs that 
can be generated by applying operations to objects in pursuit of some 
type of goal, and an identification &° between sets of object names 
and objects and operation names and operations and between difference 
names and the programs that recognize differences. (These programs 
are necessary components in the process of achieving type T and type 
R goals.) It is essential to realize that L° is well defined and fixed, 
that the identifications of the signs are fixed, and that the objects they 
denote are fixed. Hence, the set of differences are fixed and there is no 
possibility of learning a novel set of differences within L°. Similarly, 
once we have agreed to accept certain properties of the physical mark 
“LION” as relevant (the usual indices of its constituent characters) 
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and to credit the mark “LION” with a certain denotation (a member 
of the usual set of animals) we cannot learn a novel set of differences 
between, say, “LION” and “GIRAFFE.” There is, of course, a world 
of difference between the elaboration of the human being’s sign system 
and LD, and it is also true (as we shall argue later) that the human being 
seems to have no fixed processing language (whereas G.P.S. must have). 
But we may agree to fix the language we use, as in some kinds of 
argument, and, if we do, we are in much the same position as G.PS. 
Notice, by the way, that the goals of G.P.S. are not specified in L*, 
although the system can decide between types of subgoal. The goals 
appear in G.P.S. as instructions from the experimenter or programmer 
that have a well-defined connotation in L°. 

Now the programmer takes a much wider and more comprehensive 
view. He knows perfectly well, for example, that the objects denoted 
by object signs v ¢ V° are not unitary entities but consist of parts. He 
also knows that either these parts or the entire objects are capable of 
description in terms of many different attributes such as, in the case 
when the objects are logical expressions, the possession of constituent 
symbols, of being right-hand or left-hand members of larger expressions, 
or having a given connective. 

So far as the programmer is concerned, there are a vast number of 
possible differences obtainable by comparing the attributes or features 
(collections of attribute values) chosen to describe the objects. For one 
reason or another he has chosen a few specific differences, has written 
programs to detect these differences, and has denoted these recognition 
programs by the difference signs in L°. It is, of course, equivalent to say 
that the L* view of the world is more comprehensive than the L° view 
of the world and, as before, we shall use this formalism. In fact L® is 
specified by the programmer (or defined in the scientific metalanguage 
L*). 

Since a novel set of differences cannot be learned in L°, the question is 
“What further structures must be defined in L* in order to permit 
difference learning?” 

We first answer the auxiliary question, ‘“What is the form of differ- 
ence learning?” 

Since the denotation of a difference sign in G.P.S. is a recognition 
program, the act of learning a novel set of differences must involve 
writing difference recognition programs. 

It is thus necessary to provide an identified language LZ! = V1, 2), &, 
in which programs that act as recognition programs in [° can be con- 
structed and compared with one another. These recognition programs 
must be more than concatenations of the programs originally defined 
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in L°, (They will be composed, in general, from programs able to recog- 
nize more elementary features of the objects denoted by L° signs.) 
Since further axioms are adjoined for this purpose, L’ is a metalanguage 
with reference to at least some expressions in L°. 

Given that Z is defined in L* it is possible to denote the elements of 
a higher level problem environment. The objects in this problem en- 
vironment are subsets of the difference recognizing programs denoted 
by L° difference signs. The operations in this problem environment are 
capable of modifying these objects (denoted by L' object signs), for 
example, by deleting elements from or adding them to the sets con- 
cerned, and they are related in LZ to the algorithms used in program 
construction. 

Differences exist between the attributes of sets of the original differ- 
ences and the denotations of L' difference signs will be higher order 
programs that recognize these differences. The ultimate goal for opera- 
tions denoted in L' will be to achieve programs that recognize a “good” 
or “adequate” set of L° differences. 

Now the whole of this hierarchical construction is somewhat arbitrary, 
the choice of the operations denoted in L', the choice of objects, and 
of differences. The programmer or experimenter does the choosing and 
he justifies his selection by reference to canons of rationality or efficiency 
that make good sense in L*. However, we, who also communicate in 
terms of this language, may agree to the choice; for example, we may 
agree that the operations stipulated tally with the operations we say 
we perform when solving problems in everyday life. Our agreement in 
this matter specifies the constraints 2! that determine permissible 
strings of signs wu ¢ V' and adds whatever sanction is necessary to the 
form of V! and of 6&1. (We have tacitly agreed to quite a lot already 
by sanctioning L° and its identification.) 

What method should be adopted for building and selecting the pro- 
grams in this higher level problem environment? It is, of course, 
quite possible to apply the naive algorithm of generating strings of 
signs by “chance” and selecting those which satisfy the criteria (i) of 
being programs and (ii) of being able to recognize a “good” or “accep- 
table” set of L° differences. 

However, as Newell, Shaw, and Simon point out this is likely to be 
an impractical procedure and it is certainly unnecessary. If we are 
able to specify sensible goals in the higher level problem environment 
(which amounts to giving a sensible interpretation to “good’’ and 
“acceptable’’) then we are certainly in a position to advocate a more 
provident mechanism. As in the case of learning the entries of the 
application matrix they recommend that G.P.S. should be used in its 
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normal mode of activity as a problem solver (in other words that the 
problem of finding a “good” or “acceptable” set of L° differences 
should be solved by the methods used in the lower level problem 
environment to find and transform logical expressions). Since the 
denotation of the domain of the lowest level in G.P.S. is arbitrary (the 
objects may be logical expressions or images or sets of control variables) 
there is no objection to this proposal and it can be argued that the prop- 
osal is optimal in the sense that it minimizes the number of axioms that 
need to be introduced. In a general theory of constructive problem solv- 
ing automata, as in 2.8, the argument for optimality is very strong indeed. 


(4) Suppose that this proposal is adopted, we arrive at two important 
conclusions: 


(I) The activities going on at different levels in an hierarchically 
organized problem solver are analogous and in a suitable representa- 
tion may be isomorphic. 

(II) The process which we call “learning” in an automaton which 
solves problems communicated to it in L° is no more nor less than 
the activity we should call “problem” solving if we communicated 
with the system in L. 


Hence an over-all prescription for difference learning is to create a 
problem solver (in a broad sense of the word which would include, 
among other things, an adaptive controller) and to make it solve 
problems posed at different levels in an hierarchy (the solution to 
higher level problems determining the differences used by the lower 
level processes). Although there are many technical difficulties, there is 
no reason why this construction should not be applied to ‘operation 
learning’ as well as “difference learning’ in which case the solutions 
of higher level problems determine the structural parameters of the 
lower level systems. Further, the hierarchy can be extended by adding 
L?...L™. Manifestly the organization is isomorphic with the hier- 
archical structures of 2.7 and 2.8. In the present case we call it an 
artificial intelligence because the constraints upon the system resemble 
the constraints upon our own problem solving. 


(5) Ina recent paper, Newell [775] gives a fairly detailed construction 
for a difference learning system and stresses the point that recognition 
(or, in a system capable of changing its own denotative faculties, of 
perception) must be mediated in the same language as its operation 
(or, at the lowest level, of manifest behavior). Very much the same 
argument is advanced by Mittelstadt [7/9], MacKay [120], and others. 
MacKay, for example, rejects the commonly proposed mechanism of 
Fig. 31(1) in favor of the mechanism of Fig. 31(2) which compares an 
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input from its world with the actions it will perform to modify its 
world and acts to reduce the difference between the two. The actions 
are engendered by a self-organizing system which tacitly constitutes 
an internal representation of the environment, and it is evident that 
comparison can only take place between similar representations. In 
our present terminology only expressions in the same language are 
comparable, unless some other process, such as a translation, is intro- 
duced into the model. 

Returning to Newell’s [218] construction but using our own nomen- 
clature, there is a problem environment with objects a, b... (that are 
assumed to be logical expressions) denoted by v € V° of L°. The artificial 
intelligence is also provided with a more discriminating perceptual 
apparatus able to discern certain attributes of the objects in the problem 
environment and with an identified language Z} in which it is possible 
to construct propositions about the attributes of and the differences be- 
tween objects. (This does not contradict the assumption that problems 
are posed by communicating with the artificial intelligence in terms of 
L°). If expressions in ZL! are described in graphical notation so that 
attributes appear as the branches of a structural graph, the L' images 
of a and 6 will appear as shown in Fig. 32(1). The difference between 
a and 0 is the difference structure of Fig. 32(2). Only differences that 
can be reduced by applying operations to the objects involved are 
relevant, and the difference structures corresponding to these differ- 
ences are obtained by matching the input and output of an operation 
as indicated in Fig. 32(3). Equivalently it is possible to synthesize 
operations that reduce the differences that have been discerned. But 
these operations determine expressions in L1. Hence an operation 
derived from the most primitive elements in a difference structure would 
be inapplicable. Useful operations correspond to classes of operations 
capable of reducing these primitive differences and Z! must be able to 
represent the aggregation of these classes. (Expressions in L' must define 
the relationship between the primitive and the sophisticated forms of 
operation.) The process used for this purpose in Newell’s construction is 
Feigenbaum and Simon [J2/], and Feigenbaum [122] abstractive sorting 
program EPAM (but any other abstractive mechanism could be used). 

The ultimate criterion for selecting an abstract difference or a 
sophisticated operation is the possibility of using it to generalize and 
make inductive inferences. We have already pointed out that only a 
few of the possible abstractions lead to words that are capable of 
generalization, and one of the chief criteria in constructing Z' must be 
that its pertinent expressions have this property. Newell embeds the 
principles of generalization in G.P.S. as heuristics that suit plausible 
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arguments in L*. One principle is a unity of symbols, thus one “A” 
is taken as the same as another ‘‘A”’ and one ‘‘B”’ is taken as the same 
as another “AB” regardless of the expressions in which they occur. On 
applying these principles to the difference structure of the sophisticated 
operation ‘A’? we obtain the generalized form of difference which 
corresponds to an extension of “‘d”’ and which is shown in Fig. 32(4). 

Given the comments we make in 3.2(7) and 3.2(8) on the interpret- 
ation of goals, there seems to be no reason why principles of general- 
ization must be embedded in such an explicit fashion. 

As one alternative, the system could generate its sophisticated 
operations according to rules that determine the way it looks at the 
environment. The same generation rules may govern its abstractions 
from primitive attributes. Broadly speaking, the system is an evolu- 
tionary device that aims to impose its own pattern upon the environ- 
ment and the principles of generalization, which we shall return to 
discuss in 3.2(6) are implicitly embodied in its structure. 

(6) Amarel [123, 124] has programmed an artificial intelligence which 
learns to present and prove theories. It does so by learning to build 
programs. Relative to G.P.S. it has only a modest problem environment 
but the system is worked out in great detail and the basic ideas (in 
particular the need for an hierarchy of languages) are exhibited un- 
ambiguously. We shall not describe the entire system (nor attempt 
to detail any part of it) and the original paper should be consulted to 
expand the present outline. 

The problem environment is a set labeled c, consisting of 16 elements 
that constitute the nodes of the symmetric lattice in Fig. 33 under an 
ordering relation ‘“‘>’’ so that, for any pair 2, €o, 2, €a, it will be the 
case that either z, > z, or 2, > z, or that z, and z, are incomparable. 
Let z, be the uppermost point of this lattice and let up, u,, wu, be variables 
with values that index z €o. A transformation 7’; from the product set 
[o, o,] into [oc] is explicitly defined by a set Cy, of 16? correspondences 
Uy = T,(u,, U,). The job of the system is to learn, at the nth move in its 
history, a program Py, (n) which, given any pair u,, wu. in the domain 
coordinate of a set Cj, (n) will successfully compute the value u, = 7; 
(U;, Ug) (where C7, (n) C Cp, is the particular subset of correspondences 
which has been presented to the system at its nth move). 

If a successful computation is performed for all pairs u,, u, and if 
Ch, (n) C Cp, it can be argued that P,,(n) constitutes a theory about 
7, when it is expressed in a suitable language. Similarly, when the 
system creates and evaluates tentative programs P,,(m), n > m, which 
are not necessarily successful, it can be argued that the machine 
language representations of these programs constitute hypotheses about 
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T, based upon the evidence available at the mth move. The problem 
environment is open to a number of plausible interpretations; for 
example, since any pair of the elements contained in a have, by defini- 
tion, a complement and a G.L.B. and an L.U.B., we may write ex- 
pressions like “7, = G.L.B. (uw, u,)” or “VT, = L.U.B. (um, u,)”’ or 
“T, = Comp [G.L.B. (u;, wu)” or “7, = Comp [L.U.B. (uy, u9)].” 


af = 


2% 


Fria. 33. The lattice. 


The system embodies knowledge about the structure and the extent 
of the problem environment and it is provided with certain basic 
problem solving facilities which are represented in an identified pro- 
cessing language L,°. 

The vocabulary V° of L,° consists of signs denoting elements of 
o* =o U empty set; signs denoting the elements of p (a set of all subsets 
of o*) signs for collections (processing lists) of the elements of p and signs 
for program statements, for association, and for equality. In addition, 
there are signs denoting basic, inbuilt, logical operations like “Nn” 
and “VU” and for procedures like searching a list and adding members 
to a processing list. A feature of Z,° which we shall return to in 3.3 is 
that it is ‘open ended.” In other words, additional operations and 
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“compound operations” can be added to V° as they are developed and, 
in our own nomenclature, additions of this kind amount to transform- 
ations of the form L,° Z,°. . . Since we shall not pursue the action of the 
program in detail, and since Amarel uses a slightly different notation, 
we shall refer to the processing language simply as L°. 

The programs P, are represented in L®° as strings of L° statements. 
It is possible to replace the operations appearing in these strings by 
operational variables X, (with values that are operations) and the 
characterization of each X, (a term 4, that denotes the domain the 
range of its value). Thus, for example, we may write “X, =A, 4,; 
[p.p.] >...p...” or again ‘‘X, = Comp, 4,; [o* > o*].”” The crucial 
importance of the characterization 4 appears in connection with 
relevance, or, as we called it in the discussion of G.P.S., applicability. 
We say that X is relevant to Y if (i) domain X = domain Y and if 
(ii) range X C range Y. Compound operations in L° can be replaced by 
strings of operational variables X, = [xy, %,...%,], and their 
characterization. 

Programs P,, are assembled in an open-ended language LZ. The 
vocabulary V1 of Z contains strings of signs representing transforma- 
tions 7’, signs denoting the initial set of operations “N’’, and ““U”’ and so 
on, together with signs for operations that are added to V° and strings of 
signs for simple and compound operational variables (and their character- 
ization). In addition there are substitution rules, compatible with 
{2' which permit the recursive assembly of a string representing a 
program P,,,, the initiation of this recursive process, and its termination. 
(In terms of G.P.S. these substitution rules are broadly equivalent to 
the methods for goal achievement.) 

The initial statement, posing a problem to this system, will have the 
form “X >, 4; = A)” where “A,” is the initial string in L'. Suppose X, 
is a compound operational variable characterized by Ap and that it is 
relevant with reference to X,,; the corresponding string 4, in Z, with 
Ap as its left-hand member, may be substituted in the original expres- 
sion so that we obtain “X74j > X7,A;’ or equivalently “A, -> A,.” 
The continuation rules allow for insertion of ZL strings between the 
strings of any suitable L* expression while the termination rules apply 
when the resulting string is a completely consistent expression in [. 
This will occur when, as a result of substitution, the compound opera- 
tional variables pertinent to the original expression have been evaluated, 
in the sense that each member of the set of constituents, for example, 
each member of X, = [xy, %,... X] has been replaced by an operation. 
The process involves testing for applicability or relevance (as illustrated 
by the initial substitution) and, whenever more than one alternative is 
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possible, making a decision to select among the possibilities. The 
organization involved can be represented as a “tree,” as in Fig, 31(2). 
At the point when evaluation is complete there will be well-defined 
paths in this tree (terminating at operation nodes) and, if this condition 
is achieved at the mth move, this structure is equivalent to a string 
A (m) in LZ’ that represents a program P,, (m). The program is now 
translated into L° and tested. The tests are rather elaborate but 
embody the criteria of 3.1(4) (i). Depending upon the outcome of these 
tests Pp, (m) may be incorporated as a composite operation, it may 
be modified, it may be rejected or accepted for further testing against 
the evidence of C#, (m + 1). Hence the decisions made in the develop- 
ment of the program tree may either be substitution decisions or 
decisions that modify the program tree by adding nodes. In general the 
results of testing are fed to each relevant decision node and in some 
conditions the over-all program, which mediates the assembly rules 
and makes the decisions whenever an ambiguity has to be resolved, 
calls for the generation of a novel compound procedure. (This will occur 
when the compound operational variables cannot be substituted in a 
manner yielding tentative programs that satisfy the tests.) This novel 
compound procedure is generated by an auxiliary mechanism (at the 
moment, by the experimenter himself). 

Since the number of possible program trees that might develop is 
enormous, the realization of successful testing and control depends 
upon an at least local metric in the set of program trees. Further, the 
over-all control upon the development of trees is an evolutionary 
process and at least the program value tests and the program elabora- 
tion tests will resemble an economic control in which the long term 
value of a program is pitted against the cost of maintaining the struc- 
ture required for its performance. Finally, efficient convergence towards 
the solution of a problem depends upon the possibility of recognizing 
ambiguous situations which are similar, and when they recur of making 
a similar decision. 

(7) In Amarel’s system, this feature appears in an important but 
rudimentary form as the transfer of decisions between the decision 
nodes. The same feature is broadly expressed by Minsky’s generalization 
heuristic of 3.2(4)(iii) which is, “Given a situation thet calls for a 
decision, recognize a similar, previous, situation and apply a close 
variant of any algorithm that was successfully applied on this previous 
occasion.” 

It is no accident that this is a special case of an ‘awareness’ heuristic. 
We might call an artificial intelligence “aware” or even “conscious” if 
(i) it can recognize its present configuration (as similar to its past 
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configuration) and generalize its structure by applying whatever 
algorithm is likely to maintain its integrity and (ii) (which we have 
guaranteed by embedding this artificial intelligence in a linguistic frame- 
work) if it can name an invariant of its configuration. 

All this leaves open the issue of what similarity criteria and what 
measures of success should be used. So far as an artificial intelligence is 
concerned, these similarities and measures must be chosen so that the 
matching process which underlies its problem solving activity has the 
caliber of “proof making” or possibly of “‘inductive inference’ (however 
weakly either of these are defined). Conversely, a choice of this kind 
has been made, though possibly not in an explicit fashion, for any suc- 
cessful artificial intelligence. We stress the dissemination of ‘‘proof 
making” or “inductive inference.” It permeates and colors every 
action in the system. (Hence, ingenious and efficient procedures, such 
as an inductive inference algorithm proposed by Solomonoff [725], are 
probably too specialized, for the machine is given inductive inference 
as a special faculty rather than part of its character.) Wiener [7] has 
discussed the matter chiefly from a mathematical point of view and 
examined the restrictions entailed by having a finite machine and an 
orderly environment. 

Viewed at this level, the whole issue is very difficult. It is, after all, 
a hoary talking point among philosophers. Fortunately, a somewhat 
analogous situation pertains at the level of perception and motor 
behavior and the aura of mystique evaporates within this relatively 
tangible framework. The question in this connection is, “what kind of 
abstractions and what kind of program synthesizing and goal achieving 
mechanisms are compatible with an efficient machine?”’. To avoid 
dispute over criteria of efficiency, let us lay down the dictum “efficient 
enough to survive in a given and reasonable environment.” 

Wiener [7] has also examined this question. The fact is that any 
sensible machine must characterize the data it receives as some kind 
of Gestalt; and when it acts upon an orderly environment or when 
(at a higher level) it manipulates and tests an internal representation 
of this environment, it must compute universals. McCulloch and Pitts 
[47] exhibited the first explicit mechanism to achieve this objective and 
they did so in a physiologically plausible fashion. Ullman [726] has 
recently built a device, capable of learning to organize motor activities 
constrained by the possible motions of the joints in a limb, which has 
the required properties. 

The simplest case will illustrate the kind of restriction that is needed. 
To abstract a Gestalt, the relevant test operations of an artificial 
intelligence must be transformations that belong to a finite group. 
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Further, it must be possible to define a functional (which may corres- 
pond to a perceptual attribute) that assigns a unique number to each 
transform of some input object that belongs to this group (in other 
words, to each test transformation of an input), The group average of 
this functional (taken over a space of the parameters that index the 
several transformations) will determine an invariant. (In many cases 
an average taken over a suitable subset of points in this parameter 
space will have the same property.) Finally, an ordered set of group 
averages is a Gestalt. 

A matching process may compute a universal if the selected opera- 
tions are transformations (indexed by parameters) that belong to a group, 
and if there is a monotonic measure of distance between the transformed 
input object and a subset of points (or concept, according to our 
previous definition) in the space of perceptual attributes. The process 
does compute a universal insofar as it selects operations to minimize 
the distance concerned. The group properties implicit in this special 
case are more rigid than necessary but indicate the kind of restriction 
that must be applied. Insistence upon an hierarchy of control rather 
than an hierarchy of abstraction, in 2.7 and 2.8, guarantees that restric- 
tions of this kind are built into the system. 


(8) An abstractive mechanism is a basic component of any artificial 
intelligence [28]. In Amarel’s system, expressions 4 in Z* denote 
assembly operations that build an hierarchy of more or less abstract 
programs (denoted by expressions in L°). In a different sense, the 
denotation of L' is an abstract representation of some of L°. (In this 
case there is a further distinction of type.) Similarly, Newell’s G.P.S. 
contains an abstractive routine. 

A number of other mechanisms have, however, been devised, many 
of them including some learning capability. Although most of these 
are oriented to visual pattern recognition, the input domain is irrelevant 
and it would be legitimate to use them for recognizing patterns of 
programs or parameter values. 

The simplest abstractive schemes involve sequences of tests which 
are conveniently represented as a test tree, in which each node cor- 
responds to the sorting of an input item. Hence the input to the pro- 
gram appears at the uppermost node of the test tree and the output con- 
sists of the selection of one of the lowermost nodes, indicating that the 
input item has passed a unique sequence of tests. Combined with list 
programming techniques, as in a program written by Banerji [727], 
rather elaborate sorting strategies can be conveniently instrumented. 
Data are retained in lists, but (at the simplest level)no real learning takes 
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place. The sorting criteria are not changed or replaced, as a result of the 
system’s experience of previous tests. 

The next degree of elaboration is introduced with the possibility of 
replacement and modification which appears in a very efficient pattern 
recognition program due to Vossler and Uhr [128] shown in Fig. 34(1). 

The input is a retinal matrix on which is projected a binary input 
pattern (black and white elements). The system looks for features of 
the input pattern by matching feature operations, which are pre- 
defined binary submatrices, against the input on the retinal matrix in 
all of a set of positions. At each location, a matching test is made 
between the input and the feature submatrix. Further operations 
combine the test output derived from this lowest level of the system 
to determine higher order features. The system is externally reinforced 
and the lowest feature submatrices are evaluated for their degree of 
relevance and of discrimination. Useless features are discarded and 
other features provided to replace them. (One feature generating 
algorithm is to copy some region of an input pattern as a test operation 
which ensures that at least one test is passed for one input pattern.) 
Depending upon the feature generating rules, the system may (or may 
not) be called evolutionary. If there is a mechanism for variation and 
recombination of the existing features and if there is some form of 
economic or competitive constraint, whereby the system is forced to be 
provident regarding the number of features used, by levying a cost 
for their structural maintenance, then it probably is evolutionary. 

The Vossler and Uhr program closely resembles the parallel system 
“pandemonium” which was devised somewhat earlier by Oliver Self- 
ridge [114]. A typical “pandemonium” is shown in Fig. 34(2). The 
lowest level elements (which may either be subcontrollers interacting 
with their environment or the feature recognition programs we shall 
assume for the present discussion) are whimsically referred to as 
“demons.” These demons are supposed to perform a parallel compu- 
tation and their joint output, indicating the attribute values of the 
environment, is abstracted by higher order or middle ‘‘demons.” At 
this level, the weight attached to the output of each demon may he 
adaptively modified. The resulting signal, from the middle demons, 
is conveyed to a decision-making system which uses this evidence to 
support one of several alternative hypotheses about the state of the 
environment (or, if the environment is a retina, about the pattern 
displayed on this retina). The connections between the middle demons 
and the decision elements are weighted (and these weights will certainly 
be adjusted by feedback controlled adaptation). The feedback or 
success information, delivered by the decision element, is also supplied 
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to a process which selects lowest order demons. This typically parallel 
process resembles the information feedback in Amarel’s system, to 
each of the active decision nodes. 

As in the Vossler and Uhr [1728] system, some of the features com- 
puted by the lowest order demons will be acceptable whereas others 
are likely to prove useless. The latter must be discarded and replaced 
and even the component features may be modified with advantage. 
Consequently, depending upon the reports received from the over-all 
decision making element, these feature detecting programs are deleted 
or altered. 

They could, of course, be improvidently replaced by chance variation, 
but several alternatives are possible. In the simplest case, it may be 
sufficient to preserve some of the experience gained by the system by 
creating new demons from parts of previously successful demons. This 
is a method of “recombination” of parts. An alternative procedure is to 
exploit the cooperative interaction that can be encouraged between 
members of the lowest order demon population if they are provided with 
a suitable language for communication. Cooperation will take place, as 
in 2.8, if these lowest order demons aim to survive in a partially com- 
petitive environment. In this case, the feedback from the over-all 
decision-making component is used to determine certain evolutionary 
rules (rate of aging, payoff distribution, or the parameters cited in 2.8). 
The important point is that the demon population is a program (or a 
set of programs, one for each demon) that is embedded in an overall- 
program structured to sustain evolution. The species of demon that 
survives will (i) be able to thrive in the conditions that are maintained 
by the over-all decision maker and (ii) will be able to interact co- 
operatively with other members of its species. Regarding these demons 
as constituents of a program, cooperative evolution implies that its 
elaboration is maximized. The cooperation rule may engender further 
advantages by way of computational stability. 

Finally, there are abstractive mechanisms like EPAM. 

The original EPAM program was devised by Fiegenbaum to stimu- 
late the memory and recall of items such as word or pattern lists. An 
input object (or pattern) is processed by a tree of feature tests and its 
image A is assigned to a certain terminal node of this test tree, say C. 
Suppose that C is already characterized by an image B (of one or more 
previously processed inputs). The present image A is matched against 
B, and if A is identical to B then A is assigned to C. On the other hand, 
if a difference is detected, a further test D is constructed to distinguish 
between A and B. Now the image A is assigned to one terminal node 
of D and the image B to the other. The test D is finally attached to the 
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original test tree at the node C and the test develops in this fashion. 

Later versions of EPAM, developed by Fiegenbaum [1/22] use a 
parallel type of associative memory (which, in the present design, is 
simulated as a sequential mechanism by dint of association lists). 

Suppose the environment contains composite objects made up from 
relatively familiar simple objects. (We assume that feature recognition 
has already taken place.) Thus the input may be a composite object 
m which includes a pair a and 6 of simple objects that are members of a 
list of possible simple objects a, b, c, d. As before, the system derives 
images A, B, C, D of simple objects a, b, c, d, and in the case of m as an 
input it will derive an image M with component images A and B. The 
image M is now associated with the images A, B, which have been 
produced by learning a variety of composite objects having these con- 
stituents, for example, with an object p containing b and d with an 
image P that is associated (like the image M) with the image B, but 
not, in this case, with A. Hence the process of abstraction is inter- 
mingled with memorization and involves building associations between 
relevant subsets of images. (The novel composite object is abstracted 
and memorized in the context of the familiar images of its constituents.) 
The contextual plan is particularly evident in the reprocessing of data 
when the images of partial objects act as cues that recall sequences 
of other images from the association system. As required by the match- 
ing paradigm of difference learning, essentially the same system is used 
for abstraction and for the synthesis of operations. 


3.3 Intelligent Artifacts 


(1) The apparent gulf between the physical artifacts cited in 2.6, 
2.7, and 2.8 and the computer simulations of artificial intelligence is 
filled by relating the states of an artifact to the symbolic entities of 
programs and problem environments. Briefly, we need to render a 
sensible correspondence between signals and messages. Mere identifi- 
cation presents little difficulty (a simple construction is given in 2.6 
and in 2.7) where a system is related to an identified language, although 
fairly elaborate procedures are sometimes needed, for example, in 
denoting the stable modes of Beurle’s[68, 69], Agalides, [129], Caianiello’s 
[15], or Farley’s [70, 71] system as signs in a processing language. (One 
method involves a wave guide structure that delivers a characteristic 
plane wave front to a block of medium having an artificially elevated 
average threshold level, in which the input wave is attenuated to a 
characteristic excitation of a single element of the medium.) 
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However, this is only a part of the tale. Whatever the identification, 
the symbolic structure must reflect the form of the underlying mech- 
anism. In this respect a physical system is much more restrictive than 
a computer simulation. This may or may not be an advantage. 


(2) Any competent artificial intelligence must compute universals, 
and the chance that an arbitrary set of abstractions and an arbitrary 
set of operations and synthetic procedures will furnish us with a system 
that does compute universals is remote. Do the restrictions imposed by 
physical and mechanical laws help us in this respect? Is there a greater 
likelihood that our systems will compute universals if we abide by these 
laws? 

In principle, the answer must be ‘‘yes.”’ A brain, after all, is a physical 
artifact and any brain (barring, perhaps, the simplest) does compute in 
the desired fashion. But in practice, the effect of physical restrictions 
upon the design of an artificial intelligence has only been considered 
in special cases. Greene (76, 77], for example, discusses the requirements 
of mentation (pointing out, among other things, that an artificial 
intelligence has a specific organization such as the kind proposed by 
Wiener [7] and that it must aim to impose its organization upon its 
environment). He goes on to suggest certain analogies between the 
current mathematical analysis of physical systems and the entities 
that characterize the transformations executed in the artificial intelli- 
gence. In fact, Greene is chiefly concerned with non-linear oscillatory 
networks of the kind considered by Beurle and Farley, but on the 
assumption that the behavior of a large ensemble of these systems can 
be approximately characterized by a set of linear equations. He seeks to 
establish relationships between the quirks of symbolism that appear as 
necessary features of his model of mentation and certain modes of 
oscillation and their properties. Thus, symbols must carry implicit 
information about the distribution of other signs as well as acting as 
marks (like the “images” in Fiegenbaum and Simon’s program which 
bear contextual as well as specific data), and it is argued that special 
oscillatory modes do have this property with respect to the distribution 
of other nodes. Again, having identified symbols with a suitable set of 
oscillatory modes, the stable (or resonant) modes of oscillation evocable 
from the system for certain parameter values are analogous to the 
Gestalten of perception. 

Although this work is very stimulating, a more general approach is 
probably needed. Is it possible, for example, to build a statistical 
mechanics of computing systems and physical systems alike? If so, the 
gulf between signals and messages is filled in a curiously elegant fashion. 
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Work is in progress in various quarters; but although its direction has 
been indicated, for example, by Wiener in his recent lectures at Naples, 
there are no publications, so far. A still more generalized approach is 
adopted, by Churchman [136] and, in a very different way, by Petri 
{137} and Gunther [732]. 


(3) The processing languages of a competent artifact are almost 
certainly open ended, not only in the sense that terms are added to the 
vocabulary but also in the sense that the meaning attached to the 
existing symbols is changed. Hence, the descriptive framework we have 
adopted with languages L,” could be more accurately (but less clearly) 
replaced by a language that evolves. 


(4) The point is illustrated by an artificial intelligence due to Fogel 
[133]. The meaning of a symbol is its denotation of an interval (between 
a pair of threshold values) along the coordinate of an input variable z. 
Thus 7 may be assigned to values of x between the thresholds 7, and 
T;,, If the input sequence z(t) is nonstationary, Fogel argues that, 
in order to satisfy a number of plausible criteria, such as 


(i) maintaining the transition probabilities between symbol sets 
within bounds, so that the symbols are useful elements in a proba- 
bilistic model and 

(ii) maintaining a reasonably informative correspondence between 
the symbols and the events they denote, 


it is necessary to adjust the values of the 7; and, consequently, the 
meaning of the symbols so that the probabilities of symbol occurrence, 
p;, are roughly equalized. One strategy for equalizing the p, by con- 
structing a sequence 7’, (é) is described in a recent paper. 


3.4 Some Difficulties 


(1) In each of the systems we have examined, the problem is posed 
and the goal is specified in Z}. It would be possible to provide a further 
language L* capable of expressing different kinds of problem and alterna- 
tives (in the sense of different kinds of problem) and different forms 
of goal. This expedient has not, however, been adopted although Newell 
discusses the matter, (Newell points out that G.P.S. is a machine that 
works in one ‘“‘sense modality’’ and he cites the need for a higher order 
problem environment, denoted by L*, for a comprehensive goal-seeking 
activity.) 

In fact, failure to represent a universe of goals within the artificial 
intelligence is tantamount to viewing the artificial intelligence as a 
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calculator. The idea of giving the system advice by way of heuristics 
is fictional. Really, we tell it what to do. 

This criticism can be partially avoided when simulating a population 
of problem solvers communicating in 1° and in L’. George [134, 135], for 
example, has programmed a variety of game playing systems that inter- 
act with one another. In some of them the play involves a choice between 
types of action and types of outcome, and it is necessary to allow for 
communication of these choices in a language L? over and above L° and 
I, This is particularly true if the game playing machines are required 
to settle whether or not they will cooperate with one another and, if so, 
to bargain over acceptable terms. On the other hand, George must 
determine the sort of population that he is considering and must embed 
certain common criteria of success in each member (otherwise com- 
munication between the members could not take place). 

To what extent does this commit us to viewing the population as a 
set of calculators? 

The crucial feature seems to rest upon the way that the hierarchy 
of languages is constructed. Like £° and L‘ in the systems we examined 
in 3.2, the languages will be “open ended”’ in the sense that transform- 
ations “Z",> L", .,—...” can occur as a result of experience. In a certain 
sense, no transformation of this kind'is able to create an essentially 
different problem solver (for no further axioms are introduced into the 
logical system which the language denotes). On the other hand, a 
transformation of the form “L’,>L",,,—>...” can and, except in very 
special cases, must involve such a distinction for the metalanguages 
£"*! denote a system embodying certain axioms that are absent from 
L", If the population opts out of the game altogether or if some members 
decide to play a different game using rules and reinforcements that the 
experimenter had not recognized, then such a transformation can be 
inferred. Although it is convenient to envisage members of a population 
of machines that cooperate with one another when solving the problems 
posed by a common environment (communicating at various levels in 
order to achieve cooperative activity), the same comments apply to 
cooperative interaction (and the necessary degree of communication) 
involving functional parts of a single mechanism. But, if this mode of 
interaction exists within an artificial intelligence, then there must be 
some kind of parallel computation. 

This, in my view, is the chief significance of the sequential or parallel 
dichotomy of 3.1(5). If there is parallel organization, as there is, for 
example, in a “pandemonium,” then cooperation may occur and the 
set of expressions that are messages communicated between the 
parallel components at any instant determine the name of a Gestalt. 
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Whether or not a parallel organization does, on some occasions, require 
a parallel mechanism is an open issue. 


(2) Any cooperation depends upon some kind of communication 
between members of the population. When each member is able to 
build its own interpretative and synthetic programs, stable cooperation 
depends in a critical but marginally understood fashion upon com- 
munication at different levels. 

There must be a level of discourse (level L' perhaps) that conveys 
instructions and intentional statements (in contrast to the object 
language expressions in L°). Since A can adopt different “views”’ of the 
environment and different “‘attitudes’’ fo the environment, cooperation 
with B is impossible unless A can inform B of what these are, by dint 
of expressions in L'. If A and B are jointly matching a collection of 
objects, for example, the process can only take place if these objects 
are commonly represented. Otherwise they cannot be compared. 


(3) Maron [57] points out that rules of logic and of sign substitution 
are constraints that determine what cannot be done. They do not guide 
a system in selecting what should be done. In particular they are not 
useful decision rules. 

By providing a linguistic hierarchy, we give an artificial intelligence 
a framework in which it can abstract from the state of its environment 
and synthesize programs that select among relevant environmental 
operations. By introducing heuristics, we ensure that the system is not 
utterly stupid, that is, concepts (in the sense of 3.1) are intelligible, and 
we remove as much uncertainty as possible. However, some issues are 
undetermined and, if they are encountered, a decision is required, 
(In fact, “decisions” are needed quite often, to choose between plausible 
alternatives. But we shall emphasize the ‘‘undecidable” situation where 
no substitution rule is available.) 

Now, as Maron argues, none of this structure determines what should 
be done under these circumstances. However, a problem solver that 
survives must select some operation and an artificial intelligence must, 
in addition, communicate its choice. Over the ensemble of systems, no 
alternative is favored. Hence, for any one system, A, the situation is 
resolved by an arbitrary selection R, of a sign V, from a relevant 
vocabulary. Of course, the selection made by R, is individually sig- 
nificant. It is an index of individual A preference, for operations or for 
forms of program, depending upon the vocabulary. Suppose V, € V1 
so that V, is a statement of an A preference of the latter kind. 

To show this, imagine a couple of systems, A and B, living in a com- 
mon problem environmentand communicatingin L°and L!. A problem M@ 
202 


ARTIFICIAL INTELLIGENCE AND SELF-ORGANIZATION 


is posed (an expression in L°) and one member of the pair, A, adopts 
a strategy denoted in L' as Ay (system B is informed of Ay in L*). At 
some point, A,, terminates undecidably since no substitution rule is 
available. The undecidability is also manifest to B. Hence, when R, 
selects V,e¢V! to resolve the situation, B interprets V, as an A 
“preference.” (It is one case of a mapping from the states of R, into a 
subset of V1.) To compare “preference,” B selects A,, if A selects rj, 
and the denotation of V, is matched, by B, against the denotation of 
V, (selected by Fz). 


(4) The need to maintain a certain rate of action, or rate of application 
of operations, is derivable from the need to maintain a positive rate of 
adaptation. The latter requirement is a consequence of the isomorphism 
between the organization of an artificial intelligence and a type (iv) 
self-organizing system. At some point, the association learning of 3.2(3) 
(which is an admissible form of adaptation) must give place to the 
“concept learning” of 3.2(3) (because, as in 2.6, a stable self-organizing 
system must change the domain with reference to which it adapts its 
behavior, in the present case, its symbolic behavior). In terms of ‘‘open- 
ended”’ languages, the transformations of concept learning may either 
be (i) denotative LZ,” > L?,,, a novel sign, V, and its denotation is 
adjoined to V,," to form V?,,; or (ii) constructional, when L”, -> I} *1 
due to embodying one or more novel axioms. Mode (i) corresponds to 
the change of attention and mode (ii) to the metalinguistic construction, 
considered in 2.7 and 2.8. 


(5) There is a necessary and important distinction between the 
approach of a biologist and a logician (or a scientist who programs a 
computer) when faced with the issues of artificial intelligence or of 
self-organizing systems. We have argued that the effective construction 
of an artificial intelligence or a self-organizing system will always 
involve an artifact able to build an hierarchy of metalanguages. Now 
the biologist and the computer-oriented logician are both concerned 
with a realizable artifact. But, in a certain sense, the biologist can take 
the whole of natural evolution to be the artifact concerned and the 
individual brain one member of a specific class of end products, This 
allows him, in fact, to regard the construction of an hierarchy of meta- 
languages as plausible and realizable. Most of the construction occurs 
in natural evolution. Some facet of this capability is embodied in the 
medium of an individual brain. 

The computer-oriented logician cannot accept this point of view. In 
a sense, his program must exhibit both the historical or maturational 
and the immediate proclivities of an intelligent or a self-organizing 
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aystem. Now if the program is to be realized, there must be some way of 
limiting the proliferation of the metalanguages that are needed, in our 
formulation, to express the distinct levels of control or instruction. 

As Gorn [136] points out, the expedient that is normally adopted 
consists in using a “mixed language” capable of expressing various 
instructions (control instructions, descriptions, object designations) 
in place of an hierarchy of metalanguages. (The hierarchy corresponds 
to a stratified and restricted system.) Gorn also points out that the 
advantage gained by such a system of “unstratified control’ is bought 
at the price of a certain ‘‘pragmatic ambiguity” in the sense that some 
expressions in the mixed language are open to various interpretations. 

Now we comment that the biologist and the computer-oriented 
logician are not really at variance. The brains examined by a biologist 
do exhibit unstratified control and the languages in which they com- 
municate internally or externally, must be deemed mixed languages. 
Only the biologist is at liberty to regard a brain or the whole gamut of 
presently considered brains as one stratified subsystem of a large 
stratified evolving system; whereas the computer-oriented logician, 
lacking this possibility, must give explicit consideration to the mixed 
language in which he describes his program. 

Similarly, we may use the necessity for some explicitly developing 
hierarchy of metalanguages, or some explicitly “mixed” language and 
the use of “‘unstratified’’ control to counter the criticisms of 1.1. ‘‘Prag- 
matic ambiguity” and, indeed, some embodiment of paradox are 
necessary and completely harmless consequences of realizing any 
artificial intelligence. Their appearance is a structural matter of fact 
and is no argument against the conception of an artificial intelligence or 
a self-organizing system as such. 


4. Other Disciplines 
4.1 Psychological Models 


Since any realizable logic comes within the compass of psychology, 
there is an obvious relation between psychological theories of problem 
solving, learning, and perception and the design of an artificialintelligence. 

At the most abstract level, any well-validated, man-made, heuristic 
is a candidate for embodiment in an artificial intelligence. Conversely, 


*It is always possible, in principle, to obtain an unambiguous statement of what 
gocson. In the “‘pure” case of logic, the paradoxes are avoided by introducing a distinction 
such as the type distinction, in the theory of logical types. Similarly, in the present 
case we can (as in 2.8.), introduce a distinction of logical type to avoid ‘“‘mcchanical” 
ambiguity (when the descriptive mctalanguage for the system must contain a theory 
of types or its equivalent). There is, however, no need to do this. The distinction would 
serve our own ideas of completeness rather than the functioning of the system. 
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any heuristic that works in an artificial intelligence constitutes a 
testable hypothesis about mentation. 

At the level of organization, Hovland’s [137] definition of a concept 
is isomorphic with the widely acknowledged construction we have 
adopted. Using this idea, Hovland and Hunt [/38] have devised com- 
puter programs which may equally well be interpreted as a simulation 
of concept learning in a human being and as parts of an artificial 
intelligence. Similarly the TOTE hierarchy used by Miller, Pribram, 
and Gallanter as their descriptive framework for mental activity is 
isomorphic with the hierarchical organization embedded in an artificial 
intelligence. The fact is that psychologists and computer logicians are 
now using the interdisciplinary terms of cybernetics to describe their 
problems and their results, and the fact that more is gained from the 
exercise than a mere change of nomenclature is a welcome justification 
of this science. 

These analogies are fairly recent. Many others exist; for example, the 
mass of work due to Piaget [140] (Flavell [139]) upon the maturation of 
various faculties in the human being (the ability to appreciate the persis- 
tence of objects, to detect invariants, andto comprehend number) provides 
a myriad of clues to the development of these faculties in an artificial 
intelligence (and although we have not yet stressed the education of 
machines to reach an acceptable standard of competence, this is one 
of the most important issues). Bartlett’s [747] work on memory and 
Craik’s [142] on perception suggests the proper choice of the similarities 
and invarients of 3.2(7). The ethologists, starting from the empirical 
foundations laid by Tinbergen [143] and Lorenz [/44], provide rules 
for hierarchical organization in any system and its environment that 
must, it seems, be obeyed by either organisms or machines. Hull [145] 
and later Hebb [146], Broadbent [147], Brown [148], Gregory [149], 
Mackworth [750], Milner [757], Barbizet [152], and Welford [153] are 
among many psychologists with a mechanistic bent who have, in fact, 
described certain subsystems of an artificial intelligence. Finally, the 
experimental methods of psychology have influenced and have been 
influenced by the experimental situations used to test an artificial 
intelligence. The very different methods of Skinner [154, 155] (rein- 
forcement learning and behavior shaping), Harlow [146], and Bruner 
[157] [as in the system of 3.2(2)] are applied to the artifacts while, on 
the other hand, the study of these artifacts is gaining admission as a 
proper concern for comparative psychology. 


4.2 Physiological Models 


There is no necessary connection between the physiological mechan- 
isms in a brain and the mechanical structures in an artificial intelligence. 
Thus a brain is largely a parallel computer and, although it is safe enough 
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to assert that the organization of an artificial intelligence is also parallel, 
this fact need not imply a parallel mechanism. This point of view can 
be justified even if we insist upon functional identity between the 
behavior of a brain and of an artificial intelligence. 


(1) Consider an artificial intelligence A and a brain F which are sup- 
posed to have some behavioral property P that is detectable by tests 7’,. 
Now we know that P is, at the moment, ostensively defined and that 
we cannot actually list all of the attributes of P or all of the tests 7, 
that may be relevant. However, the activity of the artificial intelligence, 
A, may be known to depend only upon a collection of subsystems B, 
that compute a function & while the activity of F may be known to 
depend only upon subsystems G (such as neurones) that also compute R. 
Now if there are tests T, which exhaustively specify R (in the sense that 
R is a consequence of certain basic physical principles) we may, while 
aiming for behavioral A, F, identity, replace the subsystems B by 
any other convenient subsystems that compute A. If, for example, R 
is “AND” and the original B units are thermionic ‘““AND”’ circuits, 
these can be replaced by their transistor analogs. 

But the argument only holds true if R is exhaustively defined. If 
certain features of F', other than those revealed by the 7’, could be 
relevant to P, then B should be made as nearly like G as possible. At 
the level of neurones, for example, it would be injudicious to assume 
too much about R. The unitary organization may, in fact, involve 
glial cells, as proposed by Hyden [158], Galambos [159], and others. The 
physical events that act as signals may be impulses or phase relations 
between impulses or the average rate of impulses as argued by Mount- 
castle. On the other hand, it would be equally injudicious to imitate 
a brain at the biochemical level. Almost certainly it is possible to 
achieve the exceptionally stringent objective of behavioral identity 
between A and F by using a more convenient material than protein. 
This is certainly true if behavioral identity is relaxed to similarity. All 
the same, the brain is probably the best provider of design principles 
and of the hunches that guide intelligent guesswork. 

A detailed survey of brain models and their empirical validation 
would be out of place. We shall, however, select a few cases to illustrate 
the interaction between physiology and the artifacts. 


(2) The over-all principles of a type (iv) self-organizing system are 
supported by broadly specified models such as an integrative structure 
proposed (and largely verified) by Anohkin [160]. The role of unspecific 
oscillatory mechanisms (of the kind simulated by Beurle, Caianiello, 
and Farley) is confirmed by Magoun’s [161] work. 
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Recent experiments by Jasper [162], McCulloch and Kilmuir [163], 
and others substantiate the existence of attention mechanisms embodied 
in the reticular formation analogous to the mechanism which we 
argued to be a necessary part of the artificial system. Braines and 
Napalkov and Setchvinsky [164, 165],and Napalkov [166] haveunearthed 
an hierarchical organization of reflex systems, open to some modifi- 
cation by conditioning, which corresponds to an hierarchy of algorithms. 
Bishop [167], using the data of comparative physiology, argues that 
a brain is an orderly and hierarchical concatenation of more primitive 
brains, some of which no longer serve their primitive functions. (This 
structure will almost certainly exist in any artifact that evolves.) 
Finally, at a more detailed level, Uttley [26, 27] pioneered the hypo- 
thesis that a brain is analogous to a conditional probability machine 
(which he demonstrated by building a number of artifacts) and that 
learning entails the reduction and differentiation of connections 
between its components. 


(3) Uttley’s model lies in between the physiology of over-all plans 
and a set of fairly specific models, for brain activity. In the latter con- 
nection, the mechanical consequences of Pavlov’s [168] pioneering work 
on systems of conditioned reflexes have been exhibited by Grey 
Walter [91]. The problems of coding have been examined by Barlow 
and Donaldson [169] and by Agalides [770]. Specific feature detectors, 
realized as neurone networks that act as filters, are demonstrated by 
Letvin, Matturana, McCulloch, and Pitts [171] (frog’s eye), Hubel and 
Wiesel [172], and Reichart [773] (the mechanism of lateral inhibition), The 
statistical histology of the brain is undergoing active investigation by 
Braightenberg [174] at Naples, while Schade [175], in Amsterdam, is 
adding to the data published by Scholl [176], which was used in Beurle’s 
simulation. The recent data in this field are available in the proceedings 
of an interdisciplinary symposium organized by Gerrard [177]. 


(4) One difficulty that besets the use and interpretation of the avail- 
able data is the fact that brains and self-organizing systems are un- 
localized automata in the sense of 2.8. Usually this implies a lack of 
correspondence between the anatomy of a brain and the functions it 
computes. Computation is distributed in the sense that one part of the 
job is done in parallel by several groups of elements such as neurones. 
Again, identical components serve different functions upon different 
occasions, and it is hard to find a tangible embodiment of the rigid 
organization that clearly exists. 

Fortunately, nature provides a few special cases in which the anatomy 
of a beast’s brain and the computation it performs are closely correlated. 
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When the system is examined, the organizational picture turns out 
to be a curiously accurate replica of the ideas we have voiced. We may 
hope that these oddities of nature represent specializations in which the 
plan is preserved intact, although it is mediated by a localized and 
tractable mechanism. 

One outstanding case is the visual system in the frog, investigated by 
Lettvin, Matturana, McCulloch, and Pitts [771], which images the 
requirements of 3.2(7), There is a mapping from visual domains in the 
retina to corresponding domains in the colliculus. For each domain 
there is a neural network which filters four chief attributes of the 
state of a frog’s environment. In the context of the possible and relevant 
actions of the animal, certain classes of invariant, derived from subsets 
of points in a space with the four attributes as its coordinates, have the 
caliber of universals. 

Another case is J. Z. Young’s [178] recent analysis of the octopus, 
which shows the beast to be an hierarchically organized homeostat in 
which the different Jevels of matching activity involve specific sensory 
modalities, each of which has a phylogenetic significance (the tactile 
modality, for example, the simple distance reception of the eye and 
the visual pattern-recognizing system). 

Each level of homeostasis (or matching) has facilities for adaptation 
and, in a slightly restricted sense, it acts as a sign system. Finally, these 
strata are coupled by amplifying systems, and the reinforcement for 
adaptation at a given level is derived from the output of inferior strata 
which may account for the creature’s “‘drives” or, in the sense of 3.3 
its “preference” for a particular outcome. In the octopus, in other words, 
the “mapping from FR into V” of 3.3 is a mapping from a set of systems 
that structurally represents its phylogency into the states of the cur- 
rently active control system. Finally, as Young [778] points out, the 
octopus is peculiar in possessing a brain with functionally localized 
parts. Hence its organization is readily detected. We may hypothesize 
that a similar organization persists in other brains where the functions 
are distributed and where their pattern is consequently obscured. 


5. The Interaction between Men and Their Intelligent Artifacts 


5.1 Present Position 


We have examined those artificial intelligences that are structured 
at the outset, by design, to a degree of competence which (in the case 
of a tangible realization of the system) is sufficient to maintain the form 
of the artifact. The design entails embedding certain basic structures 
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of language (to mediate coherent communication), of organization, 
and of heuristics, stemming chiefly from logic; that are applied within 
this prescribed structure. In 4.1 and 4.2 we briefly considered the other- 
than-logical origins of certain heuristic and organizational constraints. 

Let us now discuss machines that are educated rather than designed. 


5.2 Educable Machines 


(1) As the limit, conceive a Tabula Rasa, realized as a network of 
uncommitted components, of indefinite extent. The network is either 
(i) a crassly overconnected system in which case there is a mechanism, 
controlled by a parameter 0, to reduce the coupling between these 
components, unless they are jointly active, or (ii) the network is slightly 
connected, in which case @ controls the production of association 
between active subsets of these components. As Uttley points out, the 
former mechanism is to be preferred. This network is embedded in 
some environment that is of interest to and is possibly controlled by 
an observer who aims to train or educate the network by varying @ 
(the ‘‘reinforcement”’ variable) whenever he approves of the behavior 
produced as a result of the network’s activity. In the simplest case 
@ is binary. (The observer can approve or disapprove, allowing or 
inhibiting the consequent adaptations of this network.) 

Of course, this is the reductio ad absurdum of the least tractable kind 
of perceptron. For a sensibly large array of components, an observer’s 
chance of training the network is negligible. 


(2) Nobody is likely to doubt the need for some constraints though 
the form they should assume is arguable. For modest arrays it may 
suffice to provide a many-valued @ or, better still, to allow some dis- 
crimination by making @ a vector of many-valued variables. In this 
case, if the observer can detect the proximity of the behavior to his 
ideal, it may be possible to secure practicable adaptive convergence. 
Its rapidity depends upon how ‘“‘good”’ a training routine is presented 
to the network. But there is no really adequate criterion of what a 
“good” training routine looks like, except in some special cases. 


(3) Alternatively, it is possible, as suggested in 2.8, to constrain the 
medium of this network so that computing systems are likely to evolve 
in it. Next, it may be possible to embed evolutionary rules which allow 
the observer to predict the forms of the organizations that will evolve 
or even to predict the instants when they will evolve, so that he can 
take advantage of any chance to “imprint’”’ items of data. 

Once again, success depends upon the effective sequencing of stimuli, 
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particularly those to be “imprinted,” although in this case there are a 
number of fairly efficient principles. 


(4) Finally (and perhaps in addition) the constraints can be designed 
to open up the possibilities of communication with the machine. The 
idea, in this case, is that the observer will become a participant observer 
{assuming, so far as possible, the status of one of the machines in the 
cooperative population in 3.4(2)]. Apart from communicating in D®, to 
determine the stimuli and observe the responses of the training routine, 
the observer must bargain with the machine, which entails “‘preference”’ 
in the sense of 3.4(3), and must replace “reinforcement,” which assumes 
a predetermined ‘“‘preference,” by persuasion and compromise. Ideally, 
a participant observer will aim to have a conversation with the machine 
he is educating. For this purpose, he will adapt his mode of communica- 
tion to suit the state of the machine and he will try to achieve a com- 
promise rather than a well-defined objective. (He may start off with 
some idea of an ideal inductive inference machine but he will modify 
this idea in view of his experience providing the machine performs some 
kind of inductive inference.) In order to reach a compromise, the par- 
ticipant observer must glean some information about the behavior that 
the machine prefers (that its state or its history renders more probable 
{179}). 

In fact, very little is known about the best way tointroduce constraints 
that allow the necessary communication to take place. The data that 
are available come from a field that seems remote from artificial 
intelligence, namely from “man-machine interaction” studies and the 
more scientifically disposed studies of “‘teaching systems” [780, 181]. 
For the present discussion, we shall consider the issue within the com- 
pass of man-machine interaction. 


5.3 Dividing Labor 


There are many jobs at which man is somewhat inept. Hence, he 
uses various tools to aid him. The tools may be adjoined to his motor 
output (manual aids like cranes, pliers, and hammers) or adjoined to 
his input (microscopes, telescopes, and data processing devices). Hence, 
these tools either perform transformations of input data or output 
data. The parameters of these transformations may be invariant or 
changed at the man’s discretion as when he changes the microscope 
objective or the speed ‘of a crane. Trivially, parameter variation is a 
result of assertions in Z' compared to the input of assertions in L°, 

Other tools are interpolated in man’s problem-solving or thinking 
process. They carry out procedures that the man agrees to be rational 
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but which he cannot carry out himself due to limitations of computing 
rate or memory capacity. One of the earliest tools of this kind was the 
combinatorial wheel devised by Gardner [182] and widely used by 
the theologians of his day, for example, in ascertaining the possible 
property combinations of the angelic host. A more recent device is the 
desk calculator. 

In each case, the tool obeys the instructions delivered by its user, 
unless it is defective. Further, the user can always opt out of the situa- 
tion. He, alone, decides that the tool is relevant. The position is slightly 
different when man is not allowed this freedom and, in some adequate 
sense, subscribes to the rationality of the tool at the outset and agrees 
to remain part of a joint system while a job is completed. 

This occurs, for example, in a system examined by Edwards [183]. 
Computers are ideally suited to calculating a Baye’s estimate ; 


p(hypothesis | data) = p (data | hypothesis) p (hypothesis) 
p(data) ‘ 


which a man does slowly and with difficulty. But the machine cannot 
normally calculate the term p (data | hypothesis). On the other hand, 
a man, providing he agrees to the rationality of producing a Baye’s 
estimate and providing he receives the currently calculated value of 
p (hypothesis | data), can easily appreciate the value of p (data | hypo- 
thesis). Consequently, in the system described by Edwards, there is 
a fixed division of labor between a group of men who estimate the value 
of p (data | hypothesis) from the input to the system, given the current 
value of p (hypothesis | data) and a machine that calculates this value. 
This system depends upon the fact that there is an agreed goal and the 
division of labor is fixed. However, it is easy to conceive systems in which 
these constraints are very considerably relaxed. The user may be able to 
make the machine adopt different modes of problem solving, and to 
choose these as a function of his previous interaction with the machine. 
Insofar as this entails knowledge of program assembly (in the case we 
have cited, it would not, but if a learning artificial intelligence program 
replaced the Baye’s estimate program, it would), the interaction will 
entail Z1 in addition to L° (and in this case nontrivially). Further, if 
there is any issue of conflict between the goals of the man and the 
machine, it is difficult to say at what point the machine deliberately 
modifies the data in order to induce a certain choice on the part of the 
man. We comment that an elaborate computation hardly ever yields 
a uniquely optimum output and that a machine can learn to modify 
the data without falsifying the data. Again, if there is some conflict 
between the goals, some man-machine competition as well as co- 
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operation, then the man will adopt the same persuasive gambits as the 
machine. 

To take the process one stage further, the machine may make sug- 
gestions, phrased in L+, about methods of problem solving. If the 
I‘ proposals advanced by the man and the machine disagree, the issue 
may be decided according to an independently computed measure or 
relative merit (rather than allowing the man to have the final word). 

A man-machine system of this kind deserves the epithet ‘‘symbiotic”’ 
[184]. The man-machine interaction has the logical form of a conversa- 
tion. It is perfectly true that the man is teaching the machine (for the 
machine must learn to make suggestions that suit the man, to code the 
data in a fashion he deems intelligible, and so on). But it is also true 
that the machine is teaching the man. The entire ‘‘symbiotic” system 
is an artificial intelligence that cannot be partitioned. 

From another point of view, the machine is a medium in which the 
man can exteriorize some of the mentation that normally goes on in the 
medium of his brain. Equivalently, it constitutes a medium, like the 
man’s brain, in which the computing system responsible for this 
mentation can evolve. The machine learns to become a medium of the 
most suitable kind. The man learns togain cooperation from the machine 
by exteriorizing his problem-solving activities [189]. 


5.4 Adaptive Machines 


Systems of this kind have been fabricated and are fully considered 
in other papers [186-192]. Most of them have been used as teaching 
devices though a few have been designed as aids to performance. The 
skills and problem solving tasks embedded in these systems have, so far, 
been simple, but we have argued that there is no reason why the same 
system (referred to a more elaborate task) should not be regarded as a 
mechanism for educating an artificial intelligence. (The fact that the 
existing apparatus is biased to educate the man is a quirk of detailed 
programming). 

A number of conditions must be satisfied as a prerequisite for design- 
ing such a system. These can be asserted as axioms that determine a 
“structured skill’ or “structured problem’ environment. 

To satisfy these axioms L° and J} must be defined, for communica- 
tion between man and machine. The problems denoted in L° must 
reduce to subsets of different types of problem, these being named in 
Li, Within each subset there must be operations that partially solve 
each problem as well as primitive classes of operations that completely 
solve each problem. Hence, for each problem type, there is a method for 
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simplifying a problem. At least some of the problems will be more 
effectively solved by applying compound operations, specified by 
suitable expressions in LZ}, which are not members of a primitive opera- 
tion class. (In the case of a structured skill, this amounts to insisting 
that there is at least some positive transfer of training between its 
constituent subskills.) 

Finally, assume that man and machine are self-organizing systems 
that maintain a certain rate of adaptation and, in the case of a teaching 
system, adjoin an axiom of preference (that given the chance of adapt- 
ing in a fashion that leads to the more effective performance of a skill 
the man will prefer this particular form of adaptation). If the preference 
axiom is satisfied, we call the man a student. 

Given these conditions, it is possible to construct an adaptive teaching 
machine which; in a sense that is fully considered in other papers, 
delivers an optimum instruction. The design of the simplest machine is 
isomorphic with the system of 2.5. The mechanism / selects, among 
subsystems 5, assigned to problem types. The selected subsystem 4, 
selects a variously simplified sequence of problems from a problem type. 
As in 2.5 the adaptive machine aims to maximize the rate of change of 
behavioral redundancy and the preference axiom permits identification 
between this index and an index of learning rate. In the simplest case 
the adaptive machine also selects among the problem types in order to 
maximize the expected value of the learning rate. The subsystems it 
selects act as variably cooperative mechanisms which help the student 
to solve the problems they pose (by partially solving them on his behalf). 
Although, in a teaching system, there is a preprogrammed criterion of 
correct solution (which is available to the machine and is used to com- 
pute a learning rate), this is unnecessary. The correct solution to a 
problem, even if it exists, may be unknown. There must, of course, be 
rules and conditions; but, within the compass of these, the optimum 
procedure may be a matter for argument, 

At the next level, we introduce an ZL! interaction. The student is 
provided with a “bank balance” of a commodity called money, the 
value of which depends upon his average success at problem solving. 
When A is instructed to select a new &, the student is also asked to 
select the new &, he prefers and his selection is an assertion in Z'. The 
preference exhibited by the student is weighted according to the current 
value of his ‘“‘bank balance” and is added to the corresponding selection 
probability computed by A to yield a compound vector. The outcome, 
or actual selection, depends upon this compound vector in such a way 
that the student’s degree of control over the system depends upon his 
success. 
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Proceeding further, we could invoke L? assertions to modify goal 
selection, but in this case the preference axiom needs alteration. 


5.5 Concluding Remarks 


Empirically, the close coupling between man and machine is ade- 
quately confirmed and there seems no reason why we should not regard 
the arrangement as one method for teaching a self-organizing system to 
be an artificial intelligence. At present, the problem environment is 
restricted, but the present model can probably be enlarged to compre- 
hend any plausible universe of discourse. 

Development of the interaction correlates with a process whereby 
the self-organizing system becomes structured in the image of a man. 
In order to initiate this process, certain constraints must be built into 
the self-organizing system. It can be argued that all this structure can 
be embedded, in the same fashion, as initial constraints or heuristics. 
Maybe it can. But to choose the latter alternative would be to neglect 
a lesson learned from the brain of any sentient organism, namely, that 
maturation is of the greatest importance. On the face of it, an artificial 
intelligence is more economically created by allowing it to evolve in the 
environment it will later inhabit providing that we ensure its survival 
by building into it a set of basic and necessary capabilities. 
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Glossary 


Algorithm. An Algorithm is any well-defined sequence of operations that are 
applied to a given collection of entities or objects in order to yield, unambig- 
uously, a specified result. The entities concerned may constitute words in a 
vocabulary or signs in an alphabet (and, it can be argued that since the entities 
must be well defined they always can be identified with a vocabulary or alphabet). 
Markov [112] speaks of normal algorithms. (In a normal algorithm the process 
represented by the string of well-defined operations is reduced to a set of elemen- 
tary formulas—its alphabet is finite and its termation is defined.) Markov 
conjectures that all algorithms can be normalized. This and the possibility of 
proving theorems about the existence or nonexistence of algorithms that solve 
a given class of problems are among several issues elegantly discussed by Curry 
[193]. 
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In the present discussion we use the term “algorithm” without necessarily 
implying a normal “algorithm” which is also the usage of Braines e¢ al. [165] and 
of Napalkov [166]. Strictly, such algorithms correspond to “effective processes” 
in the sense of Church [194]. 


Attribute. A unitary property abstracted from the form or behavior of a 
physical or symbolic system. Attributes may assume several descriptive values 
and these may be identified with variables, bearing the same name, and assuming 
numerical values. Confusion occasionally arises over the usage ‘“‘value of an 
attribute’? to mean, in fact, ‘‘value of the variable with which this attribute is 
identified.” To avoid this, notice that in a 2-valued logic, the attribute ‘““Truth” 
has values only of ‘True’ and ‘‘False’* which may be denoted only by binary 
variables with values ‘‘1” and ‘‘0” or “‘T” and “F’’. This is a matter of necessity 
and definition. But the attribute “roundness”? may be viewed in several ways 
according to the conditions of measurement and the objectives of our enquiry. 
We may only choose to determine “round” and ‘not round” in which case this 
attribute can be identified with a binary variable. On the other hand we can, 
perfectly well, identify degrees of roundness, when this attribute may be identified 
with a many-valued variable. Hence the 2-valued case is more exactly viewed 
as a mapping which assigns to each value of the many valued variable exactly 
one value in the pair 1, 0. Adopting this interpretation, roundness, the attribute, 
is associated with a definite procedure for measurement, which defines it. Failing 
this, roundness could mean differently measured things, according to the whim 
of the experimenter. 


Communication process. The process of conveying data from a transmitter 
to a receiver along a channel of communication which may be perturbed by 
irrelevant data or ‘‘noise.’’ The mathematical theory of communication abstracts 
from the commonplace interpretation of transmitter, receiver, and channel to 
yield precise and purely mathematical conceptions. As a result each relevant 
datum is associated with a value, its “selective information,” which measures 
the extent to which a receiver’s uncertainty regarding the state of a relevant 
system would be reduced if this datum were signaled by the transmitter and were 
perfectly received. (The receiver is, of course, assumed to know the possiblestates of 
therelevantsystem.) Therule governing the process whereby the transmitter signals 
relevant data along the channel is called coding and is formally represented as an 
assignment of one or more signs to each collection of data. Irrelevant data orextran- 
eoussigns injected into the channelare called “noise.” It can be shown that whatever 
coding is adopted a certain limit isreached beyond which no more information can 
be conveyed along a given channel per sign or per interval, and this limit is called 
the channel ‘‘capacity.” 

This mathematical theory, due to Shannon [14], is descriptive in the sense that 
it refers, as Cherry points out [79] to an outside observer’s account of the com- 
munication process. Other formal communication models exist which are more 
broadly applicable but have less deductive possibilities (there is no analog for 
the capacity theorem), as in Harrah [795]. 


Computation. An operation carried out upon data in order to produce the 
values of specified features or functions of this data. The idea of a communication 
channel can be extended to the idea of a computation channel. But as Cowan 
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points out [797] a computing channel will reduce the information potentially 
conveyed by sequences of input signs. Vinograd and Cowan [198] provide a com- 
prehensive account of computation by finite automata and networks of compon- 
ents, 


Denotation. The assignment of a name or a sign to one or more physical 
objects or collections of signs. 


Frame of reference, A field of relevant physical or symbolic data denoted by 
terms in a deductively manipulable system of hypotheses which are associated 
with methods of proof and disproof. Thus a science, like classical physics, that is 
associated with a well-defined hypothetico deductive framework (and rules for 
inference and for inductive argument and for empirical confirmation) is a frame 
of reference. But, as pointed out [16], so are many other systems. 


Information. Regarded as a measure, information is a value of data (such as 
ita selective information). However, a number of valuo functions can be assigned 
in different conditions and for different purposes. The system of Bar Hillel [296] 
and Carnap, which evaluates propositions in terms of their degree of logical 
discrimination is considered by Cherry [79]. The measure due to Fisher in [199] 
and [200], and the information measures used in science by Brillouin in [201]. 
Although we do not use Fisher’s measure explicitly in the present discussion, 
a precise analysis of discursive informational statements about the value of 
statistical data would lead us to adopt this measure. 


Logical type. The idea of an hierarchy of logical types was introduced by 
Russell and Whitehead [202] to resolve various paradoxical situations (like the class 
of all classes and the paradox of self-reference) which are due to the ambiguous 
usage of terms. Each class of elements in a logical system (propositions, propo- 
sitional functions, and so on) is assigned an hierarchical-type designation and 
any function of a given type is allowed only elements of some type preceding 
it in this hierarchy as members of its domain. The concept of a type hierarchy 
has been minimized in the present discussion, largely because of our rather ped- 
antic conventions regarding hierarchies of metalanguages and our pedantic in- 
sistence upon a distinction between algorithms and heuristics (or hints about the 
class of algorithms to use). But the distinctions entailed by the type hierarchy are 
basic components in our arguments. (We may allow ambiguity in building an 
artificial intelligence program but we must not fail to recognize that we have 
allowed it and could remove it by a distinction of logical type when saying what 
this program is meant to do). 


Maturation, The process whereby the brain of an embryo develops into the 
brain of an adult organism. It is essential to recall that the brain of an embryo 
is coupled to its external environment throughout rauch of this process; hence, 
maturation may be held to include normal ‘‘imprinting”’ of the kind that occurs 
in a duckling where, the first moving object with certain broad characteristics 
that is encountered within a short, physiologically determined, interval is sub- 
sequently recognized as the duckling’s parent. 


Metalanguage, object language and language. The term “language” is discussed 
in detail by Curry [793]. A formal language consists of an alphabet of signs and 
certain syntactic rules for their concatenation and substitution. We have insisted 
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upon identified formal languages in which the sequences of signs enjoy specified 
denotations. Our insistence upon this issue parallels Gorn’s insistence upon the 
intensive as well as extensive definition of the terms used in machine languages. 
Gorn points out [136] that the intensive definition of a machine language embodies 
the control mechanism that mediates linguistic operations. (The extensive defini- 
tion is, of course, the alphabet of signs and the sct of strings or sequences of signs 
that can be legitimately produced by its manipulation.) 

A typical object language is the set of signs, together with the syntactic con- 
straints that define a channel of communication (and we may say that communi- 
cation along this channel! takes place in terms of this object language). A typical 
metalanguage is the language in terms of which an external observer defines this 
communication channel. 


Neurone. A cell, for present purposes, in the central nervous system of an 
animal which is specialized for conveying and producing impulses of electrical or 
chemical activity that act as signals. The active components in the central nervous 
system appear to be neurones and glial cells. The part played by glial cells, once 
thought to have no functional significance, remains in considerable doubt. Certainly 
these cells are concerned in the metabolism and maintenance of neurones and they 
may also take part, in their data processing activity and in memory [158, 159]. 

Confining our attention to neurones, there are still many varieties and possibly 
they mediate a great many different functions. The classical picture is a cell with 
branching processes called dendrites and one main process, which may bifurcate 
terminally, called the axone. Nerve jibers are the axones of certain numerous but 
specialized neurones. The dendrites and cell body of a given neurone receive 
excitation from impulses propagated along axones that terminate in their 
vicinity and form synaptic connection. The coupling at a synapse involves chem- 
ical intermediaries (such as acetyl choline) and the incoming impulses of excitation 
undergo spatial and temporal summation. When the spatial or temporal sum of 
excitation exceeds some characteristic value called the threshold of the neurone 
(the threshold, in fact, is variable) a state of excitation is engendered. This is propa- 
gated along the axone of this neurone as an impulse. The required energy is 
obtained from a local ionic transfer mechanism which is maintained against a 
potential and concentration gradient by a slower metabolic process. Propagation 
of an impulse serves to disrupt this instable ionic equilibrium. The rate at which 
impulses may be propagated is limited by the required recovery interval. In 
a cortical neurone a whole gamut of different recovery processes contribute to 
the so-called absolute refractory interval (that occurs after the neurone has been 
excited) within which no impulse sequence will stimulate it. (Later, it may only be 
stimulated by an atypical impulse sequence, and later still it returns to a normal 
state.) Recent work indicates a great deal of structure in the cell membrane and 
suggests that the summative picture of a synapse is a very crude account of the 
coupling mechanism. There is also some evidence that a neurone is, to some 
degree, an analog data processing system. According to any view, the neurone 
can be said to compute if either impulses or impulse rates are regarded as input 
signs since, in each case, its output depends upon the form of its input. 


Reinforcement. A badly defined word used in psychology to denote at one 
extreme some event which is said to be pleasurable or rewarding and at the other 
extreme to denote any occurrence which leads to an increased conditional proba- 
bility of response B given stimulus A if associated in some suitable fashion with 
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the given stimulus-response pair, A and B. Reinforcement is used chiefly in the 
latter sense, so far as machine adaptation is concerned although different mech- 
anisms are involved in different systems. 


Retina, In physiology, the collection of light-sensitive elements in the eye of 
an organism. The word is used, by analogy, in pattern recognition to denote a 
collection of photoelectric cells on which an input pattern is impressed. 


Sign. The name given to those invariant features of a given class of physical 
objects or shapes which aro uscd to denote either the class of shapes itself or some 
arbitrarily chosen class. Concatenations of signs may also serve as signs. Thus 
a word, as well as an alphabetic character, may be a sign. 


Symbol. A sign and its denotation. 


Strategy. A set of actions or moves decided upon by one participant in a game, 
which he will adopt contigent upon all conditions and possible situations that 
may arise in this game. The term strategy is often used in connection with mech- 
anical participants and a wider class of competitive and partly cooperative 
systems, as given by Luce and Raiffa [203] and by Howard [204]. Further, the 
decisions at various stages in a strategy or between a set of alternatively possible 
strategies may be made by a chance device. 


Synapse. A connection that establishes loose informational coupling between 
neurones, 
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